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The term ‘stochastic processes’ is used of any 
actual processes that have some statistical or 
chance element in their structure. (They may 
be physical processes like the phenomenon of 
Brownian motion, or the fluctuating numbers 
of electrons and photons in a cosmic ray shower, 
or biological processes like the growth of a 
bacterial colony.) The same term is also used 
for the precise mathematical representations or 
models of such observed phenomena. As pro- 
cesses of this kind are of universal occurrence, 
this book should interest a correspondingly wide 
range of specialist readers. Its primary aim is to 
acquaint the applied mathematician and statis- 
tician with the new mathematical and statistical 
techniques available for studying stochastic pro- 
cesses. Professor Bartlett has avoided too ab- 
stract an approach and has given a general survey 
of techniques and applications, including the 
results of recent research, 
Different readers will be interested in particu- 
lar sections—for example, the statistician in the 
concluding chapters on the statistical analysis of 
‘stochastic processes, the communication en- 
gineer in the chapters on stationary processes and 
heir relevance to prediction and communica- 
tion theory, and the biologist in the stochastic 
models for population growth and epidemics. 
However, the book is intended as a self-con- 
tained and unified treatment to be read as a 
whole, and the general mathematical methods 
and techniques developed in the earlier chapters 
form the basis for any applications, physical or 
biological. 
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PREFACE 


T he theory of stochastic processes may be regarded as the 
dynamic’ part of statistical theory, with a multiplicity of 
applications. Nevertheless, in spite of the importance and 
breadth of the subject, which has shown an accelerated progress 
in the last twenty-five years, there was at the time J. E. Moyal 
and I first planned (in 1946) to write a book on stochastic pro- 
cesses no general work available, as distinct from more specialized 
Monographs. Our original plan was to survey the general theory 
with especial reference to its uses and applications, both in 
Physics and statistics. However, the many new developments, 
with their tremendous range from fundamental theory to 
Specific applications and techniques, both delayed the com- 
pletion of this project and made it difficult to confine it within 
one book. We finally felt it more useful to split our contributions 
into separate volumes. Mr Moyal, who has been actively inter- 
ested in the development of the basic mathematical theory, 
Would be responsible for dealing with this. A much more elemen- 
‘any discussion of mathematical methods and statistical tech- 
niques, addressed to the applied mathematician and statistician, 
would be attempted in the present introductory work. Although 
Some reference could be made here to physical applications, à 
Systematic discussion of stochastic processes in physics would 
require yet a third volume (which I hope Mr Moyal will write 
after the completion of his book on the mathematical theory). 
My own book is now much closer in aim and content to my 
North Carolina lecture notes on stochastic processes (circulated 
in mimeographed form in 1947), although of course considerably 
amplified and brought more up to date. Any attempt, even at 
this level, to survey the whole field must necessarily risk the 
Criticism of omission or patchiness, but there nevertheless has 
Seemed to me to be a strong case for referring to so many topics 
in the same volume, in order to stress unifying principles, and 
to demonstrate the frequent value of the same technique in 
different applications. My theoretical approach is at times 


admittedly left rather formal and incomplete. In some sections 
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mathematical results have been summarized without proof; this 
is not only for reasons of space but also because the proofs would 
not always fit in with the limited mathematical aim and scope 
of this volume. Those who require a more complete treatment 
from the mathematical point of view may be referred to Mr 
Moyal’s forthcoming book, or to other theoretical publications 
which have appeared in the last few years (for example, Lévy 
(1948), Doob (1953)). It would, however, be a pity if applied 
mathematicians or statisticians were put off from using some of 
the mathematical and statistical techniques available because 
they did not feel able to absorb all the more pure mathematical 
theory. As a statistician I find it at times rather exasperating 
when the mathematics of stochastic processes tends to become 
80 abstract; time spent in wrestling with it can h 
unless, as of course mathematics is best fitted to do, it deepens 
one’s perception of the over-all theoretical picture in the 
probabilistic and statistical sense, 

The placing of the examples and applications is somewhat 
uneven, as they may be found with the relevant theory and 
methods, or separately. In particular, if specifically physical 
applications are mentioned there is no separation; thus cascade 
showers are referred to as examples of multiplicative chains in 
Chapter 3, whilst the closely related theory of population growth 
is deferred until Chapter 4. Some compromise seemed essential, 
and a strictly logical and uniform pattern was considered im- 
practicable. As no single application could 
great length, any that appeared to requir 
familiarity or explanation has been omit; 
tioned. The summary of communication 
Chapter 7 is hardly an exception to such ru 
are quite general and not dependent on parti 
munication systems. Some of the rando: 
chain applications referred to in the earli 
been rather fully treated by W. Feller i 
Introduction to Probability Theory and i 
but the restriction to discrete probabi 
the methods and class of problems he 

Intheconcluding chapters I have att 


ardly be spared 


be discussed at any 
e an excessive prior 
ted or merely men- 
theory included in 
les, as its principles 
icular physical com- 
m walk and Markov 
er chapters have now 
n his recent book An 
ts Applications (vol. 1), 
lities somewhat limited 
included. 


empted to survey methods 
T See Bibliography (for §1-3) at end. 
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of statistical inference for stochastic processes, and in par- 
ticular for stationary time-series. Genuine examples here are 
more scanty than I would like. This is partly a reflection of the 
more particularized character of stochastic process data, as 
emphasized in the text, so that suitable ‘stock examples’ are 
not so readily available. Many of these statistical methods of 
analysis are, moreover, comparatively new and still unfamiliar. 
It is hoped that they will, like the mathematical techniques 
developed in the earlier part of the book, be of interest to a wide 
class of reader. 

In spite of numerous references to original sources, it would be 
impossible in this wide survey to indicate in all cases the names 
of those first responsible for the various developments. Im- 
portant theoretical contributions have been made in particular 
by American, French, Russian and Swedish writers on prob- 
ability, the fundamental work of Kolmogorov perhaps calling 
most for explicit mention. In addition, however, to the syste- 
matic theory, the variety of individual and historical applications 
in physics, biology, medicine, economics or other fields, in many 
cases preceding and stimulating the general theory, should not 
be forgotten. Often in the text—for example, if more than one 
writer has contributed—general references have been deliber- 
ately omitted, but those most relevant are listed at the end of 
the book. This rule has also usually been adopted in regard to 
any references to my own work. A reference to Moyal (M) 
denotes the separate volume on the mathematical theory of 
stochastic processes mentioned above and not published at the 
time of writing. 

The ‘decimal system’ of numbering the chapter sections and 
subsections has been used. Equations have been numbered 
separately for each subsection. A Glossary of some of the 
standard types of stochastic processes appears just before the 
general Index. 

Finally, grateful acknowledgments are made to D. G. Kendall 
and J. E. Moyal, with whose work in recent years I am fortunate 
enough to have been in close contact, as will be evident in some 
parts of the book. Explicit acknowledgments have usually been 
inserted in the text for any unpublished work quoted—for 
example, work by research students. I am greatly indebted for 
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a number of helpful comments or corrections to A. M. Walker, 
who very kindly read through most of the final draft, to J. E. 
Moyal of course, but particularly in regard to $8 3:5, 5-1 and 5:11; 
also to D. V. Lindley, in $1-2. However, all the contents, in- 
cluding its limitations and remaining slips,t are my own respon- 
sibility. My sincere thanks are offered to Miss Barbara Appleby 
for her careful preparation of the final typescript; to Mrs G. W. 
Walls for typing an earlier draft; and to Mrs A. Linnert for 
assistance with some of the tables and figures. Acknowledgments 
are made to the Editor and publishers of Biometrika, for kind 
permission to reproduce figs. 12, 13 and 14 from my paper in 


vol. 37, pp. 1-16, and of Applied Statistics, for figs. 1, 6and 7 from 
my paper in vol. 2, pp. 44-64. 


M.S.B. 
August 1953 


+ Mr Walker, with Dr H. C. Gupta and P. A. Wallington, also kindly 
assisted with proof-reading. 

i A few slips have been corrected in tho second impression. Further 
corrections will always be gratefully received. 


Chapter 1 


GENERAL INTRODUCTION 


1-1 Preliminary remarks 

_In this book we are going to consider à subject which in par- 
ticular applications has arisen since the beginnings of probability 
theory, but the systematic treatment of which has only recently 


begun to receive the attention it deserves. We may, roughly 


speaking, think of this subject as the ‘dynamic’ part of statistical 
theory, or the statistics of ‘ change’, in contrast with the ‘static’ 
statistical problems which have hitherto been the more syste- 
matically studied. By a stochastic process We shall in the first 
place mean some possible actual, e.g. physical, process in the 
real world, that has some random or stochastic element involved 
in its structure. It will be convenient, however; also to use the 
same phrase for the mathematical representation as well as the 
physical concept, just as with the word ‘probability’, especially 
here where we shall be mainly interested in the mathematical 


theory in its role as a theory of statistical phenomena. 


Many obvious examples of such processes are to be found in 


various branches of science and technology, for example, the 


Phenomenon of Brownian motion, the growth of a bacterial 


colony, or the fluctuating numbers of electrons and photons in 


à cosmic-ray shower. In many of these examples the statistical 


or random variables under study, such as the coordinates of a 
Brownian particle, are changing with time, but change involving 
any other parameter may arise; for example, & stochastic process 
involving space parameters as ‘well as time is the ‘velocity field’ 
of a turbulent fluid. 

The mathematical theory which i 
theoretical developments is the theory of probability, as this is 
the basis of all statistical theory- In view of this central position 
of the mathematical theory of probability, its elements are 
summarized in the next section; but in view of the many contro- 
versial discussions over its interpretation, it may be as well to 


stress at once that we shall always use jt as a theory about 


s the starting point of the 
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statistical phenomena. There are many situations where obser- 
vations on particular phenomena can be repeated under similar 
conditions, but where, however closely one attempts to control 
the conditions under which the observations are made, there are 
irregular or random variations between the results of different 
trials. Nevertheless, a survey of all the trials often indicates 
regularities which stabilize as the number of trials is increased; 
such regularities are called statistical properties. (The word 
‘trials’ is of course used here in a broad sense; thus in coin- 
tossing experiments we may consider either repeated tossing of 
the same coin or simultaneous tossing of many similar coins.) 


It is important to remember that while we make use of the 
idea of the probability of an event at 


probability, at least in this st 
meaning only in relation to 
It may in fact be measured, 


frequency of times the event has occurred after a reasonably large 


uld arise from this last 
it can only be a rough 
actual events, but it can 


i ng if referring to a hypo- 
thetical or conceptual model for such events, 


1:2 Elements of probability theory + 


We first consider a finite number % mutual] y exclusive ‘elemen- 
tary events’ A,, one of which must occur on a ‘trial’, Now in 
practice if we had the results of actual trials, we should have 


empirical frequency ratios Tsn, such that the sum 2,7,/n=1. 
Correspondingly in our conceptual and axiomatic : 
postulate probability number: 


X,p,— 1. Symbolically, we can Write for the entire probability 
distribution 


EAS pA, pi ASA... +p, A 


ke (1) 
T §1-2 (and §1-21) may be omitted by a read - : 
probability theory. y er already familiar with 
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Further, for the derived probability p, say, of ‘either A, or Ag’, 
we must necessarily, from the correspondence with frequency, 
postulate the addition law for mutually exclusive events 

P=Pit Po (2) 
or in general probability notation (using the symbol 4; + Ag for 


A, or Aj) 
i P(A,4 Ag = P{Ai} + P(49- (3) 


Suppose now the elementary events A, are grouped in two 
distinct ways. In the first we have resulting mutually exclusive 
and exhaustive classes or sets Bi, say, such that B, is the sum 
of some of the A,, B, the sum of some A, not included in B,, and 
80 on. Then by use of the addition law we easily see that we have 
a new distribution £;p; B; where p;.=P{Bi}- Similarly in the 
second grouping let the new distribution be z;p,,C, We will 
next consider the probability p;; to be attached to the symbolic 
product B,C;, by which we mean the set of all A, common to both 
B; and C;. It is evident from their construction that the com- 
Posite events B;C,, over all i and j, are also mutually exclusive 
and exhaustive, so that we have a third distribution E; ; p, BiG. 


We have further by the addition law 


p,—DXgFu P7 Ep 
The identity Pij = p, QulPi) (5) 
known as the multiplication law, defines (for p;, +0) & new 
quantity p,,/p;, which obviously lies between 0 and 1 and from 
(4) satisfies the relation £;( Pulpi) = 1. Now for actual M 
ratios r,,/n, r; [n, r/n, the corresponding identity would rea 


(6) 


Where the gecond factor denotes the frequency ratio of the com- 
Posite event B,C; for those trials in which the event B; occurred. 


Hence we call the quantity p;lP:. the conditional probability of 
1C; for given B; (or more simply of C; for given B;). In terms of 


the notation (3) this is written 
P(B,C) - P(B PIC; | Bà- (7) 


rg[n— (r: d”) (ralri) 
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The probabilities P{B,} and P{C;} are called independent when 

P{C; | Bj = P{G}, whence it readily follows that P{B; | C;}= P{B}}. 
The relation (5) then becomes simply 

Pig =DjP 5. (8) 

Two entire distributions are independent if (8) is true for all 

i and j, whence 

E, jp; B;C;- (;p, B,) (235p Cj). (9) 

If a value x; is associated with the event B; we say that the 

random variable} X has a probability P(X =x,} — p;, of having 

the realized value xı. If another random variable Y is associated 

with C;, then the simultaneous or joint distribution of X and Y 


is specified by the Piz If (8) holds, X and Y are called inde- 
pendent, 


121 Distribution functions and their properties. The 
above elementary rules contain in esse: 


theory, but they have the limitati 


quency ratios obtained from actual trials. + 

In this generalization, based on the m 
measure and due to Kolmogorov, the a 
becomes a completely additive set funct: 
bability of any sum of a finite or enume 
uniquely defined with a meaning consist 


athematical theory of 
dditive function P{A,} 
ion, such that the pro- 
rable sequence of sets is 
ent with the elementary 


which refer of course to 
require some appropriate specification 
ls as a whole. It is simplest and usually 


stochastic process theory has shown that 
is permissible. 


1-2 
1 DISTRIBUTION FUNCTIONS 5 


iapa We shall not consider this generalization in detail 
bs e AE mu note some of the definitions and results that will 
Kolm ul. For a rigorous discussion the reader is referred to 
ogorov (1933), Cramér (1937, 1946) or Moyal (M). 
— variable X has possible (real) values x in the con- 
tary th range from s toco, we must postulate from our elemen- 
het probabilities P{x; < X <x} for some finite set of 
suck i di SES Our more general starting point is that 
Paese abilities given for all such finite sets define a unique 
B 1 ity measure P{B,} for a completely additive class of sets 
ne such intervals (the smallest such completely 
itive class of sets is known as the class of Borel sets, and 
P nd a sufficiently wide generalization). The most important 
kete d = the set function P{B,} are those corresponding to the 
te efined by X <x. These are identical with the values of an 
F(x) any function F(x), say, of the single value z; conversely, 
uniquely specifies P(B,). We call 
P(X «z] - F(z) (1) 
cumulative probability or distribution function of X. The 
oo of F(x) are as follows. It is a never decreasing positive 
zs S of x (continuous everywhere to the right), tending to 
of Bios ~œ and to 1 as zoo. Tt has at most an enumerable set 
FA ee and is differentiable ‘almost everywhere’, 
bie cept on a set of (Lebesgue) measure zero. F(x) can always 
pressed as the sum of three positive components 
F(x) e$ F (2) + & Fu) + 8 Fs); (2) 
where ci+c§+c3=1, and F(x), (2), F(x) have the following 
Properties: iie ir 
p af F(z) is a ‘step’ function equal to 
* (x) at all points of discontinuity less 
F(a) = Y, {F(z 0) - FG, - = X Pe 


zr 


the sum of the jumps 
than or equal to , i.e. 


(3) 
E 2(x) is an absolutely continuous function 
zr 
Fe [^ fode, 


Wh i 
Min, f(x) is the probability density or frequency function 
sponding to F(x). 


(4) 
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(iii) F(x) is the ‘singular’ component, which is continuous 
but has zero derivative almost everywhere. We shall assume that 
this third component is absent unless the contrary is stated, as 
it does not arise in practice. Many distributions (though by no 
means all) are, moreover, either of the type F(x) or F(x), and 
may then be classified without ambiguity as ‘discrete’ or ‘con- 
tinuous’ respectively. In such cases it will sometimes be con- 
venient to write in the ‘discrete’ case 


E Ps 
P(X =2} = p(x) = n (x any a); 


(a= 2), ( 5) 
and in the ‘continuous’ case 

P(Xinz, t dz) z p(x) — f(x) dx. (6) 
In both these cases this amounts to a formal identification of 
p(x) with dF(x), where F(x) is the Stieltjes integral 


F(x) = | * GF (x) (7) 
(the integral, like those below, is 


corresponding to the probability measure postulated, though 
the distinction between Riemann and Lebesgue integration does 
not arise in most practical problems), 

There is no fundamental distinction between scalar or vector 
random variables. For the latter we write 


P(X«x) e P(X, <x, for i=1,. 


a Lebesgue-Stieltjes integral, 


Sm) =F(2x,, 25, «sss. ) 

= F(x). 
butions with pure pro- 
distributions, but other 
ility density in a three- 


For vector variables we can have distri 
bability densities f(x) or purely discrete 
cases, such as a two-dimensional probab 
dimensional Space, may also arise. 

The average or expected value of an ordinary function w(X) 


of the random variable X is defined as the (absolutely convergent) 
integral 


E(w(X)) =| w(x) dF (x), (8) 
and similarly for functions w(X) of a vector random variable. 
In particular, we have the moment formulae 

Mean m= E{X}, Mean square E(X2), 

Variance o? = E((X — m)?}, 


1-21 
and for DISTRIBUTION FUNCTIONS 
& vector variable 


P EM 
x or bilinear moment — E(X;Xj) 
varia : 
Covari Pu wj p,7,77 F(X;- m) (X,-m)}, 
iance matrix V-E((X-m) (X-m) 


Where i 

s in the la . 

Vector or neat e € m= E(X) and X is written as a column 

7 1$ the positive with X' its transpose. The standard deviation 

coefficient p;; i ón P root of the variance and the correlation 

Variables ( Y. s defined as the bilinear moment between the two 

he cs A, and (Xj -mjle;- 

ate nt- : : 

is given by nt-generating function (m.g.£.) of 
M(0)= E{e!}, (9) 


Where @ i m 
is, biis auxiliary variable. This function always exists and 
Maginary ip uniformly continuous function of 0 when @ is 
andis bene: "P Tt is then a function C(¢) of the real variable 
vn as the characteristic function. The rth derivative 


rex’) CO) 


X, when it exists, 


lif it exi 
exist: 
and Fija E €(9) at ġ=0 gives the quantity 
inversion fi e determined uniquely one from the other, the general 
ormula from C(¢) to F(x) being 
(10) 


AF (x)= lim 3 f Ac Og) dd 
Where AE pci dt = ' 

(x) = E(w +h) — £(v) for any function £v), and F(x) is 

ctor variable X 


assum 

ed e $ 

we define ontinuous at z and zh, For 9 Ve 
M(9)- Ee" 9 (11) 

semi-invariant 


cumulant or 
at 0=0 is 


when it exists) 
In particular, 


The : 
nimy ee of M(0) defines the 
e rth ¢ K (0), whose rth derivative ( 
umulant or semi-invariant Kr. 
and į kK, =m=E{X}, Ka 05s 
nt 
he case of a vector variable 
[OW kim Wig = PTET 
ma, : i 
© most kie Gaussian distribution for a vector variable X may 
ply defined in terms ofits cumulant function, which is 
K(8) - 0m + 10" V0. (12) 
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Its distribution function, which may be obtained from C(d) by 
inversion, is equivalent to a density function 


f(x) = (22)-?^ | V |-Fexp (74x - m)' V-(x— m), (13) 


where | V| denotes the determinant of V. Any linear trans- 
formation on X still yields a normal distribution function. 

In the special case of a discrete random variable X with 
possible values 0, 1,2,..., it is convenient also to define the 
probability-generating function (p.g.f.) by 


II(z) = E(zX) = M (log), (14) 


distribution for which 
IH(z) - [14- p(z — 1), (15) 


and the Poisson distribution, a limiting case of (1 5) but extremely 


important in its own right in the theory of stochastic processes, 


II(z) = em-n, (16) 


If two random variables X and y 


are independent, we have 
F(x, y) = Fy(z) Fy (y) for all and y. I 


t follows that 
Cipi, $2) = Cx(¢,) Cy (dp). 


Putting ¢,=¢,= $, we obtain the characteristic function of the 
random variable Z — X + y. The distributi 
are independent is called the ‘convol 


ution' of the Separate 
distributions Fx(x) and Fy (y), 


and is given more directly by 
e 


F=f” Pt-y)ar, t). (17) 


An important theorem known as the Central Limit Theorem 
is concerned with the tendency of the distribution of a sum of 
independent random variables S, = X Eae E X, to become 
normal as n increases. In the particular case when the X’s have 
the same distribution, the relevant result is that the random 
variable (S, —nm)( c 4n) tends to normality with zero mean and 
unit variance provided m and c? 0 exist as finite quantities, In 


"4 
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the general case, a sufficient condition which is useful is that in 
addition to the existence of m; and o; 2: (+0) in the ‘standard- 
ized’ random variable (S, — E,m;)/ 43,02, we have E(| X; |} « 4, 
say, for all i and some k> 2. The method of proof usually makes 
use of the ‘continuity’ theorem for characteristic functions, that 
if (i) a sequence of characteristic functions C, (à) converges to 
C(9) for all 6, and moreover (ii) the limit C(¢) is continuous at 
$ — 0 (or (ii) the convergence is uniform within | ¢ | <a for some 
æ > 0), then the corresponding sequence of distribution functions 
F(x) tends to a valid limiting distribution F(z) (at all points of 
continuity of F(x)), where C(¢) is the characteristic function 
of F(x). 

In the case where X and Y are not independent, we define the 
conditional probability G(y|z) of Y «y when X=2 from the 
relation 


F(y)- | Oy |2)dFx@) as) 


where the marginal distribution Fy(x)=F(x, 0). Notice that 
although for density functions P{X={x}=0, the relation (18) 
still permits the definition of G(y|z). In such a case, (18) is 
equivalent to a density relation 


fiy) =g | 2) fx C (19) 


A conditional average of w(Y), given X =z, is defined as 


mo) | f "dela (20) 
ages will be of fundamental 


Conditi ilities and aver 
ional probabilities a be 


importance in the theory of stochastic processes, à 
interested in the future behaviour of random variables, given the 
realized values of variables already observed in the past. 


L3 Theoretical classification and specification of 


Stochastic processes 

We have noted that a random variable X may often be con- 
fined to a finite or enumerable discrete set of values, and if we 
Consider some stochastic process characterized by 8 random 
variable changing with the time t, it is sometimes convenient to 
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consider separately cases where the random variable X is 
‘discrete’, for example, the size of a population, with possible 
values 0,1,2,.... owever, it is now important to consider also 
the nature of the parameter t, If this is ‘discrete’, e.g. 
t=... —2, —1, 0, 1,2,..., our Stochastic process must be 
specified by the simultaneous probability distribution of any 
vector set of X's, Xz X(t), X(t), ..., X(t,), Say. While this 
involves in the limit an extension to vector random variables 
with an enumerably infinite set of components, this creates no 
new difficulty, being already encountered in the classical study 
of sequences of independent events, We Shall sometimes refer 
to such stochastic processes as random sequences, 


(i) If such a stochastic process X(t) is properly specified from 
the theoretical point of view, it follows that for any arbitrary 
finite set of possible values t, (r— 1, ...,n) of the parameter t, we 
Shall have a vector random variable R,z(X, ag 5X, with 
a valid distribution function F,(r). Validity here implies more 
than the validity of F,(r) for a fixed set of values ¢,; for example, 
E(x, to, %3) is a function also Oft, t, ts, and may be written more 
fully F(z, 72593; ty, to, 15), say. We may obtain Ey (221, 293 lis ty) 
as Fi, ta, 00; ti» tə, ts), and this implies that the latter function 
should not depend any longer on ¢,, 

(ii) A specification as in (i) is necessary, but it has been shown 
(for example, by Doob (1 937)) that without Some restriction onthe 
regularity of the possible realized functions x(t) to be 
it is not sufficient to answer all questions abou 
such as the probability that X(t) 
But such a restriction is no real limitation 


we should endeavour to specify our theoretical models accord- 
ingly (cf. Moyal (M)). Such a Specification in effectively an 
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enumerable set of variables X,, X;, ... will be relevant also when 
we later consider problems of statistical inference for stochastic 
processes and may sometimes be more conveniently accom- 
pen bya change in the variables considered; for example, if 
a ^ Aa occurring independently and randomly in time, a 
iens ca E of the random function Nit), representing the total 
— er of events at time t, could be given in terms of the random 
mes T. T»... at which N — 1,2, .... 

In some cases such a change of variable may show that the 
effective number of dimensions or ‘degrees of freedom" of the 
process is still finite. For example, the stochastic process defined 
by the harmonic terms 


X(t)=A cos At 4- B sin At, (1) 


Where A is a constant, and A and B normal or Gaussian random 
variables with zero means, unit variances and zero correlation 
coefficient, has only two degrees of freedom. It is evident from 
the property that linear combinations of A and B still give a 
joint normal distribution (though here a degenerate one) that 
any set of X’s is jointly normal, a property which characterizes 
what is called a normal stochastic process. Such a process has a 
cumulant function for the X’s involving only their means and 
Covariance matrix. The means are here zero and further 


E(X, X} = cos At, — ts). (2) 
It may be noted that the function (2) (and the mean of X) 
involve t only through the differences t,-t, Any distribution 
function F,(r) will thus in this example also involve t only 
through the differences t, —ts: and not depend on our time origin. 
his property characterizes an important type of process called 
& stationary process, to be discussed further in Chapter 6. Pro- 
cesses which are not stationary May sometimes be called 
evolutionary. For example, if for the same random constants 
A and B we define 
Y(t)=A + Bt, (3) 
t increases from zero. We 
‘degenerate ? processes (1) 
and t,, say, the realized 


then the properties of Y(t) change as 
may also note a further property of the 
and (3), that if X(t) is known at times to 
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values of A and B can be deduced and the entire future behaviour 
of X(t) predicted exactly. These are examples of deterministic 
processes; our main interest will be in non-deterministic pro- 
cesses whose future behaviour cannot be completely predicted 
in terms of past observations. 

We shall not continue with any complete catalogue of par- 
tieular types of processes, as it is more convenient to introduce 
them as they arise (see also Glossary). For example, the historic- 
ally and. practically important processes known as Markov 
processes will be first discussed in Chapter 2, One general point 
of interpretation should be noted. It was suggested that a process 
in the real world would usually be thought of in the actual ‘ pro- 
„cess of change’, like a growing population or a physical systemin 
motion. We shall find this idea relevant to much of the theoretical 
development, for exam ile, to the concept of a Markov process 
(as a process with no memory extending before the previous 
instant) or to problems of prediction. But a random function 
X(t) does not in general convey the notion of any such ‘dynamic’ 
aspect; it may alternatively refer to a random variable depending 
on some more ‘static’ parameter t. We have already cited the 
Space parameters in turbulence. In the model for a population 
we shall see that it is necessary to specify the number of in- 
dividuals not only at a particular epoch but also of a given age. 
Although the idea of a process may be more relevant in some 
contexts than others, it is convenient to retain the general term 
stochastic processes in all cases, 

For definiteness, we have so far in this section been con- 
sidering the mathematical represent; 
involving one (real) random qua: 
auxiliary parameter t. There is stric 
speaking of quantitative variables, si 
with ‘states’ or ‘attributes’ can alw 
random variables taking the value 
non-occurrence or occurrence of th 
however, sometimes wish to deal 
X and (or) more than one parameter. As physical examples, 
we may wish to deal simultaneously with the position X(t), 
Y(t), Z(t) and velocity U(t), V(t), W(t) ofa given particle, or to 
consider the velocity U(z,y, 2,0, V(v,y,z,t), W(z,y,z, t) of a 
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fluid for given space and time coordinates. There is no essential 
difficulty in extending definitions and methods to deal with such 
cases. Processes involving more than one random variable in 
this sense will be called multivariate, simultaneous or vector 
processes, the term multidimensional processes or fields being 
reserved for processes depending on more than one parameter. 
It is even sometimes convenient to introduce complex quantities 
and to consider Z(t) 2 X(t) -iY(t), or X@)= X(a+iy), but such 
representations are of course introduced purely for mathematical 
convenience or generality and not as direct models of ‘real’ 
physical processes. 


1:31 The characteristic functional. In $1.21 was noted 
the theoretical equivalence of a distribution function F(x) and 
the characteristic function C(ó) = Efe'#*}, or, for & bs i 
random variable, F(x) and C($). It is possible to obtain oE 
characteristic function C,($) corresponding to any EDIDI EE 
dom variable R, extracted from a stochastic process X(t). This 
is the expectation of e», where S, is the scalar po ny 
random vector R, and the corresponding mathematical vector $. 
Formally at least S, may be replaced by the more general 


“product sum’ 


s= | x00, a) 


the previous quantities Sn being particular cases of S given by 
Particular choices of the arbitrary function ®(t). The functional 
Cid) may be considered as another possible method of specifying 
à Stochastic process theoretically. While having ® less direct 
Interpretation, it is potentially useful as a concise form of 
definition which moreover lends itself to mathematical mani- 
Pulations, especially linear transformations or operations on the 
random function X(t). For example, for the process 


X(t) =A cosAt-- BsinA 


defined in the last section, we easily find 


log O() = -3 | | cos A(t- 7) 40 (0 d (7). e 


14 GENERAL INTRODUCTION 1:31 


In some cases it is useful to introduce the characteristic func- 
tional of a process in relation to parameters other than the time. 
Examples are those previously referred to such as the space 
parameters in the vector velocity field of a turbulent fluid, or the 
age parameter in a population. In the latter case it is con- 
venient to specify the cumulative number YN. (x) of persons of age 
less than or equal to z, and introduce an alternative form of 
generalized ‘product sum’ to (1), viz. 


8- [oto dN (a). (3) 


This is because the number N(x) is necessarily discontinuous 
with increment either 0 or a positive integer at each point 2. 
The various methods available for handling this last type of 
process, sometimes referred to as a point process, will be discussed 
in more detail later (see Chapters 3 and 4) 
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Chapter 2 
RANDOM SEQUENCES 


2 E The random walk 
e 
dedii cs bes a examples of stochastic processes are 
bability. "The gh o be discussed in most text-books on pro- 
ot nase, bs d € random sequences in which the variable X, 
abatiztion! Severs pt of the entire previous set of X’s. The 
Üétived eese of such sequences lies in the properties of 
es or sequences, such as the cumulative sums 


S, =X, + Xot tAr (1) 
as it represents the 
dom step X, inde- 
we may have 


The 
ro . 
M e S, is called a ‘random walk’, 
Penden in time t, of a person taking a ran 
y of his previous ones. As à special case 


juae, Oe HERR 
Problems pes aple of such a process goes back to the gambling 
gamblers A a ee century. We may suppose that two 
winning ter play a sequence of games, the probability of 
he acquires a particular game being P. Tf he wins this game, 
ses his own Ned stake from B and if he loses the game, he 
rei user 
pre dr eae of the unrestricted sum S, is the n-fold 
equivalent) of the distribution function F(a) of each X,, or 
y in terms of cumulant functions 


(2) 


S, and K that of each X,. 
function of S, (suitably 
ds to normality as 
g? (> 0) of the 


K,-nKÉ, 


Where . 

8 uj is the cumulant function of 
Sealed to in §1-21, the distribution 
i itateasen cn TAGAN and finite variance) ten 
common dix provided the mean m and variance 

n the o stribution of each X, exist. 
Nature of a gambling problem @ poin 
ach gambl e sequence if S, was limited by the 

er, and, in particular, the ‘duratio 


+ of interest was the 
initial capital of 
n of play - This 
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problem has acquired further interest in its modern application 
to sequential sampling (see Chapter 4) or in its relation to 
physical diffusion Processes with boundary conditions. Let us 
consider therefore boundaries a (<0) and b (>0), so that if S, 
first reaches or goes outside the ends of the interval (a, b) at 
time £,, the process is terminated. Denote the modified ‘distri- 
bution function’ of S, by F(z), i.e. P (x)= P(S, <vanda<S8,<b 
(r=1,2,...,n— 1)}. Then we have the recurrence relation 


b— 
F,(z)= jl , P&-9MF, aly), (3) 


since F, ;(y) only contributes to F(x) fora & y <b. (Here F (x) 
n extended sense convenient 
owing to previous absorption 


he ral no longer unity.) The pro- 
bability p, of reaching the boundary b for the first time at n is 


FE, (co) -F (b — ), and similarly for a we Write p; = F (a) — F,(— oo) 


PA= ZB PU). È arag, (4) 


then 


P(A) 2 G(co, A) - G(b — , à), P'(A) 2 G(a, A)-G(— œ, À), (5) 


where G(x, A) - > NP (x) (6) 
r=] 


satisfies the Fredholm integral equation 


, ii 
Q(x, N= F(a) +af T(x —y)dG(y, à). (7) 


This equation, due to Samuelson ( 1948), may in principle be 
solved (e.g. by going back to (3) and (6)), but it is often more 


Er is available in books 
on probability (see, for example, Uspensky (1937), Chapters 5 


), and we shall proceed at once 
to another general method, due to Wald. 
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Lemma I. Denote the random value of n at which S, first 
reaches one of the boundaries by N. Then 


n§P{N > n}>0, as my, (8) 
for any finite s. For since o? > 0, E(57] > co as n co. Hence there 
is a & such that 
P{S}<(b—a)}<1l, <l- À, say. 
Hence for ny = kk, 

P((S5,,,— Skr)? < (0b —a)* for all r=0,...,k-1} «(12 À)", 
since the increments in X, are independent. Any of the 4 in- 
equalities |S4,5— Skr| > (b—a) ensures that N «m, Thus 
finally we have shown that 

P(N >n} < (1-A) O(e-) e o(ng*) as moo. 

Lemma II. Denote the m.g.f. of each X, by M (0), assumed to i 

exist for all real 0, and to have the property 


M(0)>0 as 0—too (9) 
(this last property follows if 
P(eX«1—8)»0 and P(e*»1 +ô}>0 
for some ô> 0). It follows also that 
M'(0)=E{XeX}, M"(0)= E{X*e°*}, (10) 


respect to 0. Then if 


where dashes denote differentiation with 
1 root 053: 0 such that 


m= M'(0) +0, there is one and only one rea 
M(0,) — 1. n 
For from (10) we have M"(0) » 0 for all real 0; hence from (9) 
M(0) has only one minimum, 6, 84y- But M'(0)+0, hence 
0, 4:0, and M(6;) < M(0) — 1. The required result then obviously 
follows. 
Wald's identity. 
BipM()-Nexp (6$) - 1. (MOD. (11) 
To prove (11), let j denote a constant integer, and P;=P{N <j}. 


Am Bei} = P Ee 5) + BLE}, 
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where Q; — 1— P, E; denotes expectation conditional on N <j, 
E; expectation conditional on N >j. Now for any fixed j> N, 
S; — Sy is independent of Sy, and 


Bets) = Ej (e55[ M (0)p -N). 
Hence as E(ce^55) is also [M (0), we have 
PE {e’Ss[M (0)]-5) + Q; Ej (53M (0)] 7) = 1. 


Now Q;—0 as j co, and E/(£/5j) is bounded (as this expectation 
is conditional on N >j, so that |,S;| «b—a). Hence finally, for 
values of 0 for which | M(0)| z 1, we obtain in the limit the 
identity (11). 

In the above proof we have only made use of Lemma I for 
$—0; it may be shown further that (11) may be differentiated 
with respect to 0 any number of times under the expectation sign. 
It should also be noticed that (11) still holds if a — — co, provided 
Pj 1 and that the real part of @ is non-negative. A special case 
is when X > 0; here the conditions of Lemma II (which we make 
use of presently) break down, and it is readily seen that then there 
is no root 05+ 0 for which M(0)— 1. 


To apply the above result to cases when one or other of the 
boundaries is exactly reached at stage N, 


probability P. we put Sy —a with 


say, and Sy — b with probability P, — 1 — P,. Then 

Pa EAMON} P, E (e? M (0)]-N) — 1, (12) 
where Æ, denotes expectation conditional on Sy —a, etc. If in 
(12) we put 0 — 0,, we obtain 

1— ebo 
a= zai — cil," a3) 
For example, in the gambling problem we have 
AM (0) = pe? + ge-?, 
0, —log (q[p), 


^ 1-(qlpy 
whence P= alp (ap (14) 


Moreover, according to Lemma TI (note that M (0), as a function 
of the complex variable 6, is not singular at 0 or 6,), the equation 


—log M(0) - i$ (15) 
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has two roots 0,(9), 42(¢) such that 0,(9) — 0 and 0,(9) ^ 0, as 
¢ +0. Hence 
enoo dg) +R) (16) 
P e000 6) +R) =) 


where C,(¢) is the characteristic function of N conditional on 


Sy =a, etc., so that 
Efe} =0(9) = P040) + POA): 


Equations (13), (16) and (17) determine the complete distribution 
of N, the number of stages required to reach the boundaries. 
The importance of these formulae lies also in their approximate 
applicability even when the boundaries are passed rather than 
precisely reached, for we may still putt Syaa oT b, leading to 
the same formulae (13), (16) and (17) as approximations which 
become asymptotically exact as the individual steps become 


small in comparison with the distances to be traversed. As a 
further approximation we might treat X itself as a normal 


variable, since we know that the unrestricted distribution of S, 
tends to normality, and this will apply also to the resultant of 
several small steps. In the limit when the individual steps become 
infinitesimally small, these approximate formulae will again 
become exact. 

Now, for X normal, equation 


"E 


(17) 


(15) becomes 


0,0, 2 (7m Jon? — 21070 (18) 
As 0) = — 2m/o%, formula (13) gives 
1 Le camble* (19) 


Ne — 1 
P= =Binbjo* 


en amare? —6 


of 0 non-negative. The 


part : 
e with the + sign. As 


" n a—>—co, we require the real 
elevant root in (18) (m>0) is the on 


B 
a0, (17) becomes t0 (0) VR 
Olp) =e > 
T The symb i : ‘asymptotically equivalent to’ or, 
ymbol ~ will b used either for 85y P ing is i 
More loosely, for Jelikiasdceirers equal to’; which meaning 18 intended should 


b 
© clear from the context. 
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bm _bm/, ner 
g? 


g? m? 


whence — logC(ó)- —b0,(¢) 


(21) 


for the normal case. Expanding (21) in powers of id we obtain 
the cumulants of N, 


Ki-b[m, K,=bo?/m3, .... (22) 


Returning to the identity (11), we will obtain some moment 
formulae by differentiation. Differentiating once, 


E(Sye54LM(0)]-N — Ne'SsM(0)]-N-1 M'(0) — 0, 


or for 0=0, E(Syj = E(N) E(X). (23) 
On our approximating assumptions, this gives - 
E(N] =(aP,+bB,)/m, (24) 


provided m+ 0. If m — 0, we obtain from a second differentiation 
of (11) (which is still valid, as no condition on m was used in 


the proof), E(S$)- E(N)E(X3). (m= 0). (25) 


Now in (13) asm — 0, and 05 0, P, has a definite limit — a|(b — a), 
E(S5) becomes — ab, and hence 


E{N} = —ab[o? (m=0). (26) 
Asasecond limiting case, we may note that if — a becomes large 
compared with the individual steps, but m > 0 so that 0, < 0, then 
P,>1. Even if m—0, we had P,— —a[(b—a), and this still 
tends to unity when —a increases, although in this case from 
(26) #{N} has no finite limit. It is this last case which covers 
the classical problem of the gambler’s ruin, where a player com- 
peting even on fair terms (m — 0) with a rich enough adversary 
( —« large) is almost certain to go bankrupt (P, — 1). 
If m « 0, so that 0, 0, note that 


Pye (a — oo). (27) 


2:11 Renewals. Asa special case of the random variable X, 
consider a distribution for which X » 0 (and hence m > 0). This 
is relevant to the case of an article having an effective random 
lifetime X before being replaced by a new one, and so on. The 
total lifetime obtained from n articles is just S, and tends to 
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normality as » increases. If we consider the number X of re- 

newals or replacements required up to time t, including the initial 

oe we note that N is the first value of n for which S, >t. 
e equations (3)-(7) of $2-1 reduce to 


t 
Fy(o)=[ Fe-0 dF, (9). Q) 
PQ)- X, X72» 2 


where p, is the probability £, (oo) — F,() of reaching theboundary 
t with the number n, 


Gle,A)= X OK (a) 
* r=1 
= Fla) +al, Fle —y) dG, A) (4) 
3 i 


Putting A=1 in (3) and (4), and noting that Fx), for x«t. 
denotes the unrestricted r-fold convolution of F(x), we obtain 
the standard renewal equations for this problem, either the 
direct Volterra integral equation (4) for the total expectation of 
renewals of all orders or ‘generations’ in the interval (027,2); 
23 (1) and (3) providing an iterative solution in terms of successive 
Benerations'. 

In the case when F(x) and hence 
q(x, A), we have by differentiating (+ 


g(x, A) - f(x) +A Í Je- ingty Adr 


The solution of this equation (when 2— t) has been discussed by 
Feller (1941 ), and we summarize the relevant results. 

(i) Let the Laplace transforms of f(x) and h(a) - Af() (the 
Solution applies in the case of a more general function h(x) 


o CC 
ccurring inside the integral) be 


a(yr) = jJ» ev? f(a) dz, yy) -f e 


the unique non-negative 


G(x, À) have densities f(x), 
) with respect to v, 


(5) 


-yz h(x) dx, 


ee at least for y >o. Then 1 
ution g(x) has Laplace transform poy) given by 


ftp) e at) y 00,0 
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(If lim y(/) »1, let o’>o be the real root of the equation 
yc 


y(/) =1; otherwise, if lim y(//) <1, write o’ 2c. Then (6) holds 
yout 
for y>o’.) 


(ii) In order that g(x) defined by (6) can be represented by 
a series solution of the form 


g(a) =X, c,e%7, (7) 


where y; are the roots, assumed simple, of y(y)=1 (the series 
converging absolutely for x > 0), itis necessary and sufficient that 


€; 
=L- 8 
BHE (8) 
and that 2;c, converges absolutely. The coefficients c; are 
then given by 


LS i" 
In particular, it is necessary that f(y) is a one-valued function. 


Example 1. As an illustration consider the simplest case where 


f(x) = petz, which is well known to represent the case of a purely random 
renewal process (see, for example, the next chapter). We easily find that 
By) (1 — A - yr[u)-*, whence 


g(x, A) = pera -0, (10) 


This result is obviously true for z «t as well as x=t, for if x «t the upper 
limit of integration in (5) is x instead of t, since f(x) = 0 for x < 0. To obtain 
g(x, A) for x» t we evaluate it directly from (5), obtaining 

glz, A) = pemt—Kz (m t). (11) 


Putting A= 1 we obtain g(x, 1) = E, f, (x) =p (£ « t), representing a constant. 
renewal rate from all generations. Further, 


eo 
Pay f g(x, A) dz = entà-n, (12) 
which is the p.g.f. of a Poisson distribution with mean t, indicating that 
(ut)' e7"!/r! is the probability that the number of further renewals (not 
including the initial article) in the interval (0, t) is r. 

P(A) in (12) may be obtained rather more directly if desired by noting 
that from (4) 

G(oo, À) 2 1-- AG(t, À), 

whence in general 


P(A) «1 (4-1) G(4 A). (13) 
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: The above results are exact. If we make use of the alternative approach 
via Wald’s identity and equation (20) of § 2-1, we derive in this example 


M(0)-Q—6jp,. =A- 
C(Q)- exp {ptet — 1)}- 


As the last replacement does not occur exactly at t, the last equation is 
only established as an approximate result valid for large t- Comparison 
with (12) shows that in this caso it also happens to be exact, but this is 
not of course generally so (seo the next example). 

Example 2. As a less trivial example, suppose 


f(x) e itx e". 04 
Then we find //() 2 (L1 yip 27^ whence 
g(x, A) = QUAM) e7 sinh(gz A) (<2) (15) 
To obtain P(A) we may use formula (13), giving 
P(A) S e7!'(cosh (ut VA) + [sinh (t 4/21/49 (16) 
The asymptotic approach via Wald’s identity gives at once 
Olg) ~exP (ut eh? -1)} 
whence yw det, Kaw th 
compared with the exact results from (16) 
key = Mit — fer#tsinh gt — M7 p+ 0l’), 
jut — e-tetsin? gt (17) 


Kg = pte#tsinh yt + fe-#t sinh ut — 
= aute Ole) 

An interesting application of renewal theory is to the theory 

of genetic recombination. The frequency distribution of life- 

times? between renewals becomes replaced by the frequency 

distribution of chromosomal length between points of exchange, 


and the number of renewals by the number of points of exchange. 
In this theory one signi 
bination fraction’, which is the pr 
points of exchange, rises above 50%. Toe 
for the last example, which has been used as 
theory, we note that this probability p(t) is given by 


g)- -PD 
24- 4e eos pt sin pt), 


(18) 
(19) 
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which rises to 0-5216 at 4t—7 (for technical details of this 
application, see Owen (1949), where further general formulae— 
for example, for the number of ‘renewals’ between t, and t,— 
will also be found). 

Further aspects of these random walk and renewal problems 
will be taken up later, after the theory of stochastic processes 
in continuous time has been developed. 


2:2 Markov chains 


By considering the sums S, rather than the original random 
Sequence elements X, we obtained in $2-1 new sequences whose 
elements were no longer independent. We shall now consider 
elementary examples of stochastic processes where the depend- 
ence between successive terms of the sequence is a more intrinsic 
Property of the process. We first define generally Markov 
processes, so termed after the famous Russian mathematician 
who systematically studied particular cases of them. By a 


Markov process we shall mean a stochastic process for which the 
values of X, at any set of times ¢, ( 


rl «ty <... <t,) is consequently 
sed in $1-21, x, denoting any 


P Usos) = plan) pv, | 2) plaa | 5). 


=P(%) ple, | 2) p(z, EARS 
7263) TI ple, | 2,4), (1) 


so that a Markov process is defined by the conditional distribu- 


tion p(z, |, ;)foranyz(1,... »n, and arbitrary choice of the set t,), 
together with initial distribution P(x). 


Theoretically, this appears a severe re 
process covered, but in practice many p 
80 characterized, especially if it is noted that processes which 
are not at first sight Markovian may become so in a ‘ phase- 
space’ of higher dimensions, i.e. by a vector characterization of 
the process. For example, in classical physics the velocity as 


striction on the type of 
rocesses are found to be 


2- 
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well as the position of a particle are required at any instant 

before the future motion of the particle can be predicted. Again, 

m population studies the number of births in the past year at 

time t cannot be adequately predicted from the number of births 

at a previous time t, but may be so from a detailed knowledge 

of the numbers of individuals in all age-groups at that time. 
For any Markov process we obtain at once from (1) 


D(x, | v, o) =f. Plt, r1 | 2,3) 


= " pz, | Vp t, o) P(t pa | t, 9) 


= L gin, 12-2 BU -a | a) (2) 


the Chapman-Kolmogorov 


an important relation known as 
y its repeated use, and hence 


equation. We may obtain p(x, | x1) b 
from the further formula 


pit) = Í _ pæ, |n) pn (3) 


we may obtain also p(x,). 
; In the case of Markov processes with discrete X, to be con- 
sidered in this section, and called M arkov chains, we may write 
equations such as (2) and (3) in matrix and vector notation. Let 
D,(t,) denote the probability of the ith state or possible value of 
X, at time t,, and qj;(t,t,-1) the corresponding conditional pro- 
bability of the ith state at time f, given the jth state at time 
t_1(<t,). Writing (p) asa column vector p and (q;;) as the square 
matrix Q (with constant i in any row), we have, for example, 
Q(t, | t,-2) = Q(t, | t,1) Q(t, | Lg) (4) 
p(t) = Q(t, | fa) Pltra)- (5) 
(Notice that to preserve both the usual convention for the order 
9f suffices in the matrix Q and the column vectors of Q as con- 
ditiona] distributions, the second suffix and time in Q refer to 


the previous time. To emphasize this, the previous time is put 
to the right of a vertical stroke, corresponding to the given 


Previous state.) 
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There are two main classes of Markov chains, according to 
the nature of t. In this chapter we consider random sequences, 
ie. t is discrete. As a further subdivision we may have the 
number of states, even if discrete, finite or infinite. We consider 
first the simplest case of a finite number k of possible states (this 
case has been discussed at length by Fréchet (1937-8), second 
livre). Under these conditions equation (5) represents a finite 
set of simultaneous linear difference equations with animmediate 
solution for p(t,) in terms of P(t), say, given by 


P(i,) = T Q(t,- | lisi) P(t), (6) 


where the matrix product is ordered from left to right. This direct 
solution is quite important in practice, as, if Q and p(t) are given 
numerically, the successive vectors P(t,_s) (s 


=r—1,...,0) are 
obtained by routine computational procedure. For constant Q 
we have simply 


P(t) = Qrp(t;). (7) 


However, to study the structure and ultim 


ate behaviour of 
the solution (7), 


it is often convenient to make use of standard 
expansions or ‘spectral resolutions’ of Q in terms of its latent 


or characteristic roots A. The A’s are the roots of the equation 


|AI-Q|=0, (8) 


where | AI— Q | is the determinant of the characteristic matrix 
@(A) =AI—Q, I denoting the unit matrix. The detailed expan- 
sion is simplest in the non-degenerate case when the A's are all 
different. In this case we define for each A; a column and row 
vector S;, t; respectively, where 


Os;-As, tjQ-At, (9) 


or, if Sz (Sj, Sa, ...,8;), "Tz (t, ty, ...,t,)) 

QS-SA, T’Q=AT’, (10) 
where A is the diagonal matrix of roots A; Hence 

Q-SAS3 -(T)-M AT’. (11) 


T Some writers restrict the use of the word ‘chain’ to this case. 
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An t Qs;-t;(;5)— (t2) E 
tjs;-0 for j+i (and Ay #Aj)- 
We may choose the scale of t; and s, so that t;s;= 1, and then 
T’S =I, and (11) becomes 
k 
Q-SAT'- DAS ti- (12) 
i=l 


The spectral set of matrices A,zs;ti is easily seen to have the 


properties 
k 

fo a+, and XAcb 

la, G-3» 


the last equation following from the relations 


k 
(= Ysit j sj= Sj q(Es«) -tj 
i=l fa 


(11) reads 


A,A;= 


whence the equation analogous to 


SA,=SIS7=1. 


i=l 


k 
We now obtain Qr- DAA (13) 
i=1 


As the s, form a linearly independent set, we may express the 
initial sector P(t) = Po: S2Y> in terms of them. In fact 


k E er 
Po= (Ss) p,- X, (tio) Si 


Hence the solution p(t,) = Pr has the structure 


k 
p,- user (14) 
i=l 


where a;=t! py. We may note further that 


(ql, 1, ..) Q=(L b 
y column of Q is necessarily 


8i 
Nee the sum of the elements in an, 
always one root AL 


Unity from its definition. Hence there is 
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and correspondingly we may choose t;=(1,1,...). Since ti Do 
is also unity, «= 1. Thus (14) is of the form 


k 
P,=8,+ X @;Afs,. (15) 


The root A, « 1 cannot be exceeded by any other root. For as 
| ®(A)|=0, there is at least one non-trivial solution of the 
equation t''O—AI)—0. Let 7, be the maximum element of 


lx || for any X, where ||x || denotes the absolute values of the 
components of x. Then 


is 2 (€Q),,— [A Il fins 


whence ||A || <1 (cf. Fréchet, 1937-8, p. 105). 
From (15) it follows that for the case of simple roots the latent 
vector s, will give a limiting distribution for P, (independent 
of Po) as r — oo, provided that no other latent roots have modulus 
equal to one. A more general discussion of the asymptotic 
behaviour of p, will be given in § 2-21. Still considering for the 
moment the case of simple roots, we note the identity 
$(1)2dj9()- | B(A,) | 1-0, (16) 

where adj ®(A,) denotes the adjoint of (A). But B(A,)s,=0, 
whenceasolution s; can be obtained from any column of adj ®(A,). 
Alternatively, as any matrix Q satisfies its own ch 


aracteristic 
equation, 


k 


JI(Q—A;1)=(Q—A,1) II (Q—A,I)=0, (17) 


whence s, is proportional also to any column of [T (Q — A;1). 
j*i 


Similarly t; can be taken proportional to any row of either matrix, 
with the adjustment tis;=1. 
If Q’=Q, it is evident 


that s; is proportional to t, and in the 
limiting distribution all s 


lates are equally probable. 


Example 1. As an example consider a very simple model of an 
individual learning to make a correct response to a cortain Stimulus. 
We postulate (cf. Bush and Mosteller, 1951) that two possible responses 
to the stimulus are governed by the matrix 


an i a ) 
b l-a 
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whe: 
a ue — d at any trial corresponds to the first state. The 
P apis and b depend on the nature of the successive responses and 
Dunham E pes mM if a=b=}, each response is independent of 
m i es s; if }<a=b<1, there is a tendency to oscillate from 
ion pete ga other. 1f now we take a b, it implies a higher ‘transi- 
dus s e 7 €— an incorrect to a correct response than vice versa, 
readily find ind of preferential treatment for correct responses. We 
, consistently with these ideas, that the expansion (12) reads 


è ral Ven] ^ na -e-u (^. are emet, 


whe; a 
nce for the probability of a correct responso at the rth trial we obtain 


a + b 1 a a 
eX avc Aaa m ee 
stage © aa m 
dependi 
toe on whether the response at the zeroth trial was correct or not. 
x ee b are both unity, there is & limiting probability a/(a+b) of 
response. When a=b=1, we obtain the extreme case of oscilla- 
y exists. 


tion 
from one state to the other, and no limiting probabilit 
y inhibition of the tendency 


eum mid unless b can be reduced to zero b 
responsa 2 the incorrect rosponse, the limiting probability of a correct 
Exam ies not reach unity. 

eatin, (uad 2. Asa slightly more complicated example, consider an appli- 

Possible m rimus of inbreeding. The most powerful form of inbreeding 

native e selection for a single pair of allelomorphs 4 and a (alter- 
gones at the same chromosome locus) is by self-fertilization, 

Aais represen 


Possi i 
ible with some plants. The progeny ofa mating Aa x 


by 
(344-32) G4 30) 7 14a4 (444 jaa), 
or 50 9 
0 % heterozygotes Aa, the proportion of heterozygotes being reduced 
one of the next most powerful 


ede ped at each generation. For 
elassif rici with some animals, 
on s he different possible situations 

mating of the progeny of the mating Aa 


44A 4 44a daa). 


n matings 


of brother-sister mating, we may 
in terms of the possible matings. 
x Aa we obtain 


(44A + 44a jaa) X ( 
Simil 
arly, from Aa x aa we obtain in the next generatio: 


(44 4 4a)ax (4A t $6) - 


es the table shown, which we treat as 
valent in 


In thi 
his way our classification giv 
are entirely equi 
ted. The 


our i 

c atni (The bracketed mating types 

Pooling to the heterozygotes Aa, and thus need not be separa’ 
g of states ina Markov chain is not, however, generally admissible, 


as wi 
Will be seen later.) 
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For the latent roots we have the equation 


(A—1)(A—4) (A?— 3A- 4) =0, 


or A=], Ag s=t+4V5, Ay=}. 
Parent 
mating aa x aa aa x Aa E Haseu 
Progeny AAxAA | AAxAa 
mating " 
aaxaa 
AA x 4A) 4 t i 0 
aax Aa 
1 
AA x dal i i 0 
Aa x Aa 0 l 1 1 
AA xaa 0 0 H 


In this problem we are most interested in the 
in any generation. If we call the probabiliti 
in the table p, q, r and s (in that order), the 
will be h=r+4q. Hence, without. finding 
D, q, r and s, we may write for h 


hy7 Ad 349) +B- 4 5) C(3y, 


and determine A, B and C from the values ho, ^, and ha, which are readily 
found to be 1, 3, 1 when 79 —1. (In the above formula for h; advantage 
has been taken of the obvious fact that there will be no term D(1)! in hy, 


but this of course need not be assumed.) Thus we obtain 


ye S HS - q - psy), 


proportion of heterozygotes 
es for the classes of mating 
proportion of heterozygotes 
the complete solution for 


of progeny in a family (cf. Bartlett, 1937). 


2:21 Classification by asymptotic behaviour. In the first 
of the preceding two examples we saw that an os 


periodie solution was possible. The extension of t 
solution to k states is provided by a deterministic or permuting 
set of transitions from one state to another. We suppose that Q 
does not split up into closed groups of permutations, but that 


cillating or 
his type of 
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eai ; -— 
ch state is occupied in turn. By 8 suitable numbering of the 


states the matrix Q is then 


0 10 
0 0 1 
100 
with characteristic equation A21, and latent roots 
Ajm d etr (i2 4-L je bes (1) 
Th k 
us p, 24459" (2) 


Lt representing a process in which each element of 

‘ sedan probability distribution moves deterministically and 

eye cally through the k states, the whole cycle being repeated 

indefinitely. 

me evaluating adj (Q— wyl) 
Sj1s proportional to 


, we find (as is directly obvious) 


(1,057 2s] 7) 


and t; X coded A 
j to (1,0j 1,0j 2,,..,05 1), 
le 5 
diia (LaF, ups 217) 
Where of is the complex conjugate of wj. Thus in this case t; is 
ded each is adjusted in scale 


the complex conjugate of s provide 


by the same factor 1/yk. 

Mei (2) we see that a limiting distributi 

ire ei exist, though it does if we average 
er of consecutive times; i.e. 


on in the strict sense 
over a large enough 


m c 
lim - Ep7 To (3) 
mec 
say, where obviously To satisfies the equation 
(4) 


(Q- 07» 79 


a . — 
nd represents here à uniform distribution. 
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Let us consider now the asymptotic behaviour of the general 
finite chain. From the relations | $(Q)|=0 and QP, =P,41, we 
see that p, satisfies the kth order difference equation 


| ®(Z,) | p,=0, 


where Z,=1+A denotes the ‘shift’ operator in the calculus of 
finite differences, It follows (see, for example, Milne-Thomson 
(1933), Chapter X1) that the solution P, has the general form 


P,=2;q,(r) Ax, (5) 


where the summation is over all the distinct non-zero latent roots 
A; of Q, and the q;(") are vectors whose components are poly- 
nomials in r of degree at most mM, where m, is the multiplicity 
of A;. The expression (5) will of course result from the operation 
Q” on py, where Q” is in general (Frazer, Duncan and Collar, 
1946, § 3-10) 
Q'-x 1 di-i A adj 3x2] , 
ARA, 


“nei Laat eap 


(6) 


with Vil) = TI A — Aj, 


ess m — 0. Thus as ->00 


constants, and hence the Strict limit D 


the roots A, are simple. In general, however, this limiting dis- 
tribution will not be independent of Po- (The above argument is 
due to D. G. Kendall.) 

A detailed classification of the various 


Possibilities has been 
given by Fréchet (1937-8) (cf. also Feller, 


1950). We shall con- 
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clude this section by enumerating (without proof) some of the 
more important types and properties. 

(i) We all the process ergodic if ta (defined by equation (3) 
above) is independent of p,. This is true if and only if the matrix 
(1) is of rank k— 1 (or equivalently, if the latent root 1 is simple). 

(ii) The process is called regular if Pa exists and is independent 
of py. In terms of the latent roots A; this is true if and only if 
there is only one simple root A, —1 of modulus unity. An alter- 
native necessary and sufficient condition is that Q" for some 
finite r has at least one row with all non-zero elements. 

(iii) The positively regular case has, moreover, all positive 
(non-zero) elements in its ultimate distribution. An important 
Sufficient condition is that Q has all non-zero elements. More 
generally, a necessary and sufficient condition is that Qr” for some 
finite r has all non-zero elements. 

To see further how these cases arise in relation to the structure 
of the matrix Q, we classify the possible matrix types: 

(a) The matrix Q we call reducible if it can be put in the form 


or! 


x and 0 denotes à submatrix of 
duced, it forms an irreducible 


Where P is a square submatri 
zeros, If P cannot be further so re 

closed set. 
(b) The general canonical form for a matrix with a latent root 
uitable numbering of states, 


1l of multiplicity n (> 1) is, after a 8 


p, 0 . 9 


o closed. sets or final groups of 


The submatrices P, correspond t à 5 
ch permits entry but no exit, 


sates (an ‘absorbing’ state, whi 
Provides a final group of one state). 
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(c) We call Q primitive if it has no roots A; of modulus unity 
other than the root 1. The canonical form for irreducible but 
non-primitive matrices (or submatrices) is 


9 Q 0 .. 0 
9 0 Q 2. Q0 


0 0 0 . Q. 
Q 0 0 . 9 
where Q, Q, ... Q, exists. Such a matrix is called cyclic. 

We recall that the ultimate transition matrix Q* exists if Q 
is primitive, though the ultimate distribution Q°po =P. is only 
independent of Po in the regular case, In terms of the above 
matrix classification, the important positively regular case holds 
if and only if Q is both irreducible and primitive. A useful neces- 
sary and sufficient condition for the positively regular case 
among primitive matrices (no cyclic Eroups) is that all ‘paths’ 
from any state i to any other state j (including i) are possible 
(i.e. have non-zero probability for some finite number r of time- 
intervals), 


Infinite number of states. The above discussion has referred to 


necessarily be unity. If it always is 
non-dissipative, 

An alternative approach, due to Feller, to the asymptotic 
behaviour of P,, enabling the enumerable case to be included, 


makes use of recurrence theory, which will be developed in 
Chapter 3. 


, and other references given 


by him). We shall, however, consider here only ‘linear’ or one- 
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sila systems. Physical systems are more often two- and 
hree-dimensional, and their detailed treatment becomes com- 
plicated, though they may in principle be regarded as linear 
systems with a one- or two-dimensional layer of elements. 
x the probability standpoint we define a linear nearest 
ils - our system in general as a one-dimensional array of 
és ents with each of which a random variable X is associated, 
e conditional distribution of any set of X's, say S= Xj, —o€ 
depending only on the two nearest values known, one to the left 
and one to the right. For definiteness we shall in the numbering 
: these variables think of a chain evolving from the left to right. 
" à Markov chain X, let any set of X's up to tọ be denoted by 
=(U', X,) and any set after t, by V = (X444, V’). Then 
P(S|U, yj - PU.8, y), PV" | Xpan 8, U) P(X na SLUPO) 
P(U, VY ^ P(P X, P Eal Xo UPC" 
perty for the ordered set U’, Xo, 


or from the Markov chain pro 


8, Xa y 


P(S|U,V)- zb (X, 8] Xo} = P(S| X» Xn) (1) 


PX | Xo} 

ee is the nearest neighbour property. Conversely, it may be 

Shown by similar arguments that in general the only linear 

nearest neighbour systems as above defined are Markov chains. 
We may similarly deduce that a Markov chain preserves its 

Markovian character if reversed in time. For 

y“ | pon S) P(X,4 5) 


PIS, V}_ Pf 
P{S| V}= "mi P(V' | Xa) PG 
_P{Xniv Sh _ pts | Xa) (2) 


L] 
It should be pointed out that th 


Fes reversed process do not in genera. 
he forward conditional probabilities, for they will depend also 


9n the absolute probabilities of the various states at the different 
instants, However, if we consider only stationary Markov chains 
(for definition of stationarity see § 1-3) the absolute probabilities 
Will be independent of the time, i.e. in the present application 
to nearest neighbour systems, independent of the element of the 


e conditional probabilities for 
] bear a simple relation to 
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system. If {q;;}=Q is the ‘forward’ conditional probability 
matrix, and {r;;}= R is the corresponding matrix for the reversed 
process, the simultaneous probability of X,—j, X,,,—i, is 
qi Pr; P;. It is reasonable to suppose in the present context 
that 7;;—9;;; it then follows that DP, — q;i[q;;. A particular case 
mentioned in § 2-2 was a symmetric matrix Q, for which qj; = dij 
The above relation then gives P; — P,, consistently with the uni- 
form probability distribution noted for this case. 

A feature of importance in the physical applications is the 
existence of ‘long-range order’, its absence corresponding to the 
independence of the probabilities of particular states of the 
elements of the system far apart. We have seen that this will 
ultimately follow between the zeroth and rth elements for large 
r if the root A,=1 is simple and dominant. More precisely, the 
extent of the dependence will be related to the magnitude of the 
second largest root (in absolute value), As, say. 

A further use of the technique in the physical applications is 
based on the following method of obtaining average values. 


Suppose a function G,(X,, X, ..., X,) associated with a Markov 
chain Xo, X,, ..., Xn is of the product form 


g(Xo, X3) g( X5, X) ... (X, Xn). 
Now by definition 


EG} = E, 4(6, 4E, a g(X, a Xn)}, (3) 


where Z, denotes averaging with respect to all the variables 
Xo, X,,..., X4, and Enin- conditional averaging over X, for 
given Xo, X,,..., X, ,. For a Markov chain the right-hand side 
of (3) has the same structure as the evaluation of p, with 
P(E, | 2,1) g(z, 5,2) in place of p(z,|z, ,). Hence if we denote 


the new matrix replacing Q by R, then after final summation 
over the p, distribution for Xo 


E,(G,) =t R"pp, (4) 


where t; denotes (1,1,...) as before. If in particular R has k 
simple roots 4; and hence a spectral resolution 


k 
R= M LUV; (8) 
i-1 


i'i 
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] k 
and if Po= X fiui 
i=l 
k 
n z,(0,) - t E fuite (6) 


Applying this technique to the function exp (0F,). where 


F, is a function of the sum form 

RX o Xy) +X o X3) * pK ai 
1 of F, (through its m.g.f.) 
since G, =eXP (OF,) is & 
nveniently write (4) and 


We note that the complete distributio 
mead be investigated. In this case, 
unction of 6 with G,(0)=1, we may co 
(6) in the respective forms 


E, {exp (0F,)}= t Q"(0) Po (7) 


=t Xa, (8) 


where Q(0)=Q, A,(0)&A ete (It is possible to apply this 
technique also to the case of simultaneous variables Pj 
(j — 1,2, ...); see, for example, § 8-2.) 
Pe assume that the process is regular, 80 that the moduli of 
all roots A,(0), other than the simple root A,(0) 7 1, are less than 

ay for small 0 


one. With this condition the cumulant function n 
as n increases be written in the asymptotic form 
K,(0)~nlog Ax(9) +108 [æ,(0) ti 81(4)] (9) 
-nlog A,(0) (A,() #1): (10) 
Even if (7) is not expressible in the simple form (8); this last 
asymptotic result (10) still follows under the regularity con- 
dition (from the general form of Sylvester's theorem; see, for 
example, Frazer, Duncan and Collar, 1946, g 4:15). This result 
Shows that K,(0) may be regarded, asymptotically, as the 
cumulant function of a sum of equal independent components. 
Hence the distribution of E, tends to normality under these 
Conditions, provided that its variance is such that c [n is finite 
and non-zero, and Fp is suitably scaled. 
As a simple example of this method, con 
With absolute probability P of an event occurring & 


sider a chain of trials 
t any trial, 
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but a Markov chain dependence between successive trials with 
conditional or ‘transition’ probability matrix 


Ox ^ «d L 
di Py 
The number of ‘transitions’, treating the juxtaposition of the 
occurrence and non-occurrence of the event as a transition, can 
be studied by constructing an F, with contribution 0 if no 


transition occurs and 1 otherwise (e.g. if X,=1 if the event 
occurs and 0 otherwise, take 


LB Xs pene D EAE, 


The modified matrix becomes 


o0 - (^ 2n, 


a? P 
with latent roots 


A=}(P1+ Po) + $ V{(p1— P2)? + 49,9207}. 
The square root with the + sign gives the root A4(0) with larger 
modulus, so that, provided neither qı nor qs is zero, the distribution 
of the total number of transitions tends to normality with mean 
nm and variance no?, where m and c? are the coefficients of 0 
and 40? in the expansion of log A,(0); that is, 


2nd —(. 49192 P2+ GP) 
m= » = a 
dit d» (q1 +4) 

When, however, either qı OT qo, SAY qı, is zero, A,(0) — 1, and the 
condition for a limiting normal distribution breaks down. It is 
directly obvious that when q,—0 the event will always occur, 
once it has occurred at all, so that the quantity F, has then only 
two possible values 0 and 1. In this case the exact expression 
for E(exp(0F,)) is P+Q{p} +(1—p?)e%} (where Qz1—P). 
Such a result will of course be obtained by the present technique; 
for we readily find 

A0) 7 1, As(0) — P» 

$,(0)—(1,0),  s$(0)— (e^, — 1), 

t(0)-(,&), t46)-(0, —1), 

(P,Q) =(P + Qe*) s1(0) — Qs;(0), 
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whence the result follows. For large 7, it becomes P+ Qe’, 
which again is given by the asymptotic formula (9) with 
log A,(9) =9. 
In this example on the number of transitions, the exceptional 
case arises from the ‘freezing’ or ' absorption' of the variable into 
a particular state once it has reached it; it is excluded if we 
modify the condition of regularity to that of positive regularity. 
i In the applications in statistical mechanics the possible 
simultaneous configurations or states S, 2 (Xo Xv => X,) of the 
system have associated energies €, and the configuration pro- 
babilities are assumed proportional to 
exp (9e,), where 92— 1/(kT) 
(in this expression kis Boltzmann’s constant and T isthe absolute 
temperature). The method of studying the thermodynamic 
properties of the system is to form the partition function 
Z(9) =D, exp (8€) (11) 


this being equivalent to obtaining the m.g.f. of the total energy. 


For (exp (0¢,)} = Z9 + 0/29), 
whence K(0) =log Z(8 +0) —log Z(9), 


with respect to 0 and then putting 0-0 
with respect to ®. When 


y function of the form 


€,= V(Xo, ++ X4) -v(Xo XQ) vs Xa) te +(Xy-1 Xa) 
treated like the averaging of the 


function exp (0F,) previously considered. The form of the simul- 
taneous configuration probability assumed is in fact equivalent 


to taking the symmetric form exp (9v(X,i X,)) for the simul- 
taneous probability P{X,-» X,} (apart from a constant factor), 
compatible with a stationary Markov chain in either direction. 


and differentiating K(0) 
is equivalent to differentiating log Z(9) 
it is assumed that e, is a potential energ 


the summation in (11) may be 


2:3 Multiplicative chains 
nce that is best dis- 


A Markov chain of particular importa 
h 'state' of the variable X refers 


cussed directly is one where eac? * ; 
to the number of individuals existing at time t; the time fj 
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corresponds to the next generation, which is obtained on the 
assumption that each individual of the rth generation indepen- 
dently gives rise to an entire probability distribution of possible 
numbers of offspring in the next generation. If we denote this 
distribution (for simplicity assumed constant) by its p.g.f. 
G(z), the p.g.f. from two individuals is G?(z) (by the probability 
rules summarized in Chapter 1) from three G?(z), and so on. The 
recurrence relation for the resulting distribution of individuals 
in the rth generation is thus seen to be 


ILS (2) = IL, (G(z)). (1) 
As TI, is itself a further functional iteration of G (for IT(z) =z); 
equation (1) can be equivalently written, for one initial in- 
pum. TL) - GIG). (2) 


These equations are not in general explicitly soluble, but 


imply various important consequences. For the mean and 
variance of II,(z) we have 


= 2d eS oF log T(z) A 
z=1 A(log z)? z=1 


i a logz 
Hence from (1) 
_[2logTII, 2log 
Misa [ 8logG Glogz J.., "n 
"— 0*log » (ae * „2log II, o?logG 
"^ [9ogG) Velogz 9logG O(logz)? |. 


=m? 
=m*v,+0m,, 


where m and v are the mean and variance of G(z). This readily 
gives 


(3) 


m,— mms, 
Y= M79 + m™-\(1 — mr) mov/(1— m). 


It will be noticed that m, and X, cannot both be constant except 
in the trivial case m —1, v — 0. In the case m — 1, v 4- 0, the second 
term for v, becomes rm,v, and increases indefinitely with r. 
This theory of population fluctuations is linked with a problem 
first raised by Francis Galton at the end of the last century in 
connection with the extinction of family surnames. If each male 
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individual in a population independently has a family containing 
n sons, where n has a distribution with p.g.f. G(z), and so on 
for the next generation, what is the chance of any particular 
male line becoming extinct? A complete solution to this problem 
was not obtained until many years later, by Steffensen (1930). 
The chance required obviously exists as & limit, since II,(0) must 
increase with r and is bounded by 1. From the relation (1), since 
the sequence 0, G(0), G(G(0)), --. is steadily increasing (from the 
increasing character of G(z) as z increases from 0 to 1) and has 
the minimum solution z, of the equation G(z) —z as its limit, we 
have for the required probability of extinction IIy(z,), or if 
TIg(z) =z, simply zm If z,— 1, the probability of extinction is 
unity. As 2G(z)/dz? is necessarily positive for all z from 0 to 1, 
there is at most one root of G(z) —z other than z— 1, and this 
differs from z=1 only if [0G(z)/02].-17 1. Thus the necessary and 
Sufficient condition for certain extinction is m< 1. 

This chance of extinction was investigated by Lotka (1931) 
for the United States white population in 1920, and found to be 
nearly 0-9. For example, if for G(z) he substituted the approxi- 
mate expression (0:482—0:0412)/(1— 0-5592) fitted to the 
Statistics of family sizes, the equation G(z)=2 became 
) (0-482 — 05592) = 0, 


= 0:482/0-559 = 0-86. The average value E(n) of n was 


ter than 1, being 1:175. 
esults occurs in the theory of 
g in a finite number 


0-482 — 1-041z + 0:5592? = (1—7 


giving fa 
Correspondingly of course grea 

Another application of these T 
natural selection. A gene mutation occurrin 
N ofa species is certain to die out eventually if it confers no 


Selective advantage. On the other hand, if there is a small 
advantage, so that m=1+6, We have an equation for Zm by 
vg the equivalent equation log G(z) - log in powers of 
gz, 
logz-]o Sees | 
=log G(1 aoe +4(log z)"] 54-3 
gG( )+10g2| Sees - 4(log 2) 3üog zh a 


This gives the approximate solution 
logz, ~ 20 -m= — 2e[v, 
of extinction is approximately 


or as [T.(2) —zN, the probability 
distribution for G(z), 8° that 


exp (— 2Nejv). For a Poisson 
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v=m=1+e, and e—0-01, say, we obtain (0-9804)N or odds of 
over 100 to 1 against extinction if N is as much as 250 (the 
slightly more exact result (0-9803)¥ given by Fisher (1930) may 
be obtained by solving the equation G(z) =z more exactly, given 
the Poisson p.g.f. G(z) = ene), 

The fact that extinction is certain even when m=1 stresses 
the danger of our assuming that the mean remains —— 
of an evolutionary stochastic process. The result is erri 
ofthe gambler's ruin problem ($2-1), and indeed can be identifie 
with it if we think of the contribution to the next generation fost 
each individual as an independent addition to a random walk. 
In terms of successive generations, however, the number of such 
additional components is equal to the number of individuals m 
the population at that epoch, so a time to extinction reckoned w 
terms of random walk components is in general much longer 
than the time in terms of generations. 


When m< 1, ultimate extinction may be 


immigration component, so that the iter 
cumulant function becomes, sa 


avoided by an extra 


ation relation for the 
y (where H(0)= log G(e^)), 


K, (0) =K,(H(0)) - J(0). (4) 


The limiting distribution 


> if it exists, is then given by the 
functional relation 


K.(0)=K .(H(0)) +J(0), (5) 


For example, its mean and vari 


ance, obtained by differentiation, 
are easily found to be 


m, v Myv 
Moo = Wr J J 


lom "ecl (1—m) (12m3 
v, and m, v correspond to J 
These formulae reduce to those given by Haldane (1949) when 
J(0) and H(0) both repre listributions, so that 
y be seen from (6) that v,,/m.., > 1 
» and a negative binomial distribution, 


T(z) - [1 — (2 —1)]-2 


(6) 


where my, 


if v/m>1, vj[m, 21 
b 
defined by (a, f» 0), (7) 


is sometimes in the absence of an exact solution a useful ap- 
proximation (with a—v;/mz— 1, f —m.[a). 
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E to the case m > 1, with no immigration, we no longer 
d a strict limiting distribution for the number N, of in- 
et: T* aw we can consider the limiting distribution of the 
vis iable U, 2 N,[m*. From the m.g.f. equation equivalent to (2), 

M, (0) 2 GOL 8 
and the relation: Tél ) T ))> ( ) 

L,(0)= Eeey} = (o Nim) = M, (8[m7), 

We obtain L, (0) = G(E,(8/™))> (9) 
for one initial individual) 


(10) 


or a limiting distribution must satisfy ( 
L(m0) - GUA0)) 


) that such a limiting distribu- 


(it has been shown by Harris (1948 
From equation 


I exists and is uniquely determined by (10)). 
) we find in particular 


m,=1, qv, 4 7 mv, t v, 


(11) 


or 
Mo=l, Vo -v[[m(m — 1]. 
It wi T Á— 
ah will be recalled that even for m> ] there is still a non-zero 
ance of extinction given by Zm and for the remaining con- 


ti 
inuous component of the limiting distribution we have 


1 v z 
( ET comes (02) 
MN =m Vo= m(m— 1) (1 —z,) U = Zm)? 
"opi which could be used to obtain @ rapid empirical fit, e.g 
Y a standard x? distribution of the type (2%= x) 
f) dz —aY-1e7dz| ry) (> 0). 


F z " 
or example, in the case 
G(z)=0-42+ 0-622, 


for which m 1-6, v 0-24, 2n= We find v, — 0-25. The dis- 
tribution of zz 1? above has m(x) ^ Y: v(a)=y, and hence we 
take w=. as our variable, with y=4. This approximation is 
compared in table 1 with the exact distribution given by Harris 


poan, who computed this by first expanding the characteristic 
unction as a power series and then inverting it. 
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Table 1 
7 Approxi- " 
u P(U <u} | Ist diff. ats n Ist diff. 
0 0-0000 | ~~ 0-0000 
-047 -0190 
0-28 C085 | OO 00190 |  Q1239 
050 | 01728 | 0-1429 i 
0:1727 s 0-2099 
0-15 03455 | oia; (8598 | moia? 
1:00 | 05312 | Geser 0:5665 | 0-1685 
125 | 00995 | gag | ©7360 | 0-1138 
150 | 0-8304 0-8488 
0:0882 0:0694 
1-75 0.9186 | 0-9182 
| 00492 | 00394 
2-00 0-9678 0-9576 Me 
| (00297) | (0:0321) 
2-50 09975 | jo iod 0-9897 (0-0080) 
3-00 0999 , f ) 0-9977 


More than one type of individual. 'The above methods are in 
principle immediately applicable to processes involving more 
than one type of individual. Thus equation (1) generalizes to 


T,,.,(z) - IL(G(z)), (13) 
where the changes from one generation to the next are repre- 
sented by the vector operation z -> G(z), and II denotes the vector 
or set of simultaneous p.g.f.'s arising from each possible type of 
initial individual. It is convenient to stress that (13) applies 


to each component of II, to indicate its relation with the alter- 
native extension of equation (2), 


IL.,,(z) = G(II,(z)), (14) 
in which the use of the vector II is essential. We shall see later 


when we consider corresponding equations in continuous time 


that the two alternative forms (13) and (14) both have important 
analogues. 


———— 
ae 
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Chapter 3 
PROCESSES IN CONTINUOUS TIME 


3-1 The additive process 

. We saw at the beginning of the last chapter that the distribu- 
tion of the cumulative sum S, of a number of completely 
independent and homogeneous components X, was determined 
by its cumulant function equation K,,=nk. When we consider 


the precise formulation of the corresponding problem in con- 


tinuous time, it is natural to postulate that, for any intervals 


ti and i5, 
i K(ty+ ta) = Kt) + Elia) (1) 


or equivalently, for continuous functions, 


K(t)=tK() =tK, (2) 


say. We obtain one particular solution of this equation by 
recalling the limiting case, referred to in § 2:1, of an indefinitely 
large number of independent components in any interval At, 


with finite total mean and variance, for which 


K(t)= mtip — zot’. 
vents are occurring independently 


in time, such that the chance of an event in any small interval 
At is mAt+o(At) (and the chance of none, j—mAt+o(At)). 
Then for the total number of events, equation (1) gives the 
relation 


AK =K(t+At)-Ket 


(3) 


As another case, suppose that e 


)=log (1 4 mee — 1) At+o(At)} 
=m(ei?— 1) At+ o(At), (4) 


or aKO eiit 


te the difference ft At) - f) by Af: 


deno 
ict differential 


f For any function f(t) we shall 
ef may e any foncion ee if Atis & ‘small’ interval ôt. The str: 


18 denoted by df. 
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whence K (t) 2 mt(ei? — 1), (8) 


which is the cumulant function of a Poisson distribution with 
mean mt. 

The results (3) and (5) are of great importance in under- 
standing the possible structure of additive processes, and indeed 
of Markov processes in general. The normal component (3) may 
be referred to as the diffusion component, in contrast with any 
non-zero contributions to the cumulative sum S(t) associated 
with the sudden occurrences of events at random times, as 
indicated by Poisson components of the type (5). A detailed 
study of the general equation (1) by various writers (see, for 
example, Cramér, 1937) has in fact shown that the general 
solution is a linear superposition of the normal or diffusion 
component and discontinuous or transition components of this 
second type. The solution is given by Cramér in the form 

K(t)= — toges f” ile La 19% O, (6) 

— x? 
where Q( 
and whe: 
shall as 


x) is bounded, never decreasing and continuous at «= 9, 
re in both terms contributions to the mean (which we 


s sume below to be finite) have been removed. (The 
interpretation of Q(z) in the second component is th 


at it repre- 
sents the total variance contribution from discontinuities O 
different amounts <x, the non-zero contribution from x 


given by c? having been removed; the total variance increment 


per unit time is thus ¢2 + Q (oc), assumed finite.) 

As a particular case of the second component, we may have 
a single process of the type (5) with an associated random 
variable X with distribution F(x). Then we obtain 


KQ)-mi[^ etm —1) ara) 
=mit{Cx($)~1), (7) 


where Cx() is the characteristic function for the random 


variable X. 


The equations (3), (5) and (6) refer to homogeneous additive 
processes, in which the cumulant function in (1) depends only 
on the interval and not otherwise on the time. K(t) is, however; 
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the sum of increments AK —0 with the interval At, and more 
general increments of this type depending on t may be considered 
if required. Homogeneous stochastic processes are, however, of 
especial practical significance, as their evolution depends only 
on the initial conditions and the invariant structure of the 
process, A further interesting point is that even if the incre- 
ment AK depends on the value of the variable S(t) at time t, this 
assumption modifying the process from an additive one to a 
Markov one, the form of the increment AK indicated by (6) will 
Still be correct for a small enough interval At. 

In §2-1 we considered the random walk problem in the presence 
of absorbing boundaries, and, in particular, noted that the 
method making use of Wald's identity could be applied to 
the limiting case of a large number of small steps, implying in 
effect its application to the normal or diffusion component (3). 
A re-examination of this method reveals that it may at once be 
extended to deal with the general ‘random walk’ or additive 
Process in continuous time given by (6). In particular, when the 
boundaries are far away from the starting point, the asymptotic 
approximations will be applicable. As for large t the general 
Process (6) tends to the normal additive process (when suitably 
scaled), the normal theory results in $2-1 are still asymptotically 


relevant, 
It is, however, more direct for the normal diffusion process 
he additive process (6)) to 


(3) (and hence asymptotically for t p 1 
consider the differential equation obtained from (3), name y; 
(8) 


pT 


a amig- 10°, 


an equation which will be more familiar if inverted back sees 
Corresponding equation for the density function f(x), which 
Spends of course also on t, 
32 
md = yore. i 
QU as a 
An unrestricted solution of (9) is obviously, from the equi- 


val 
ence to (8), 1 e (10) 


ik ao cm 
d dies 
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this being for the initial condition that the random walk starts 
from the origin x= 0 at t=0. The general solution of (9) is a linear 
superposition of such solutions starting from arbitrary points x, 
and may be used to construct the appropriate solution in the 
presence of absorbing boundaries, at which f(x) must venio. 
We quote the result for m=0 and boundaries a ( < 0) - —5 ed 
b (7 0). Such results may be conveniently obtained by a metho 


of images, as in some physical problems (see Chandrasekhar, 
1943; Bartlett, 1945), 


Áiz)- S (- 1) (mot)-exp(-Me—sispjo*). (D 


An alternative case when 4, —00, obtained more easily as & 
single absorbing boundary problem, is (for m2 0) 


fale) = (270%) [exp (— 32°/(0*t)} — exp (— 1 — 2b Kohn. 2) 

2 

The corresponding solutions for m + 0 may be obtained by writing 
fexp {(mx— 3m*1)/o7} for f. 

A result which will be 

multiplication rule of pro 


or (12) with (10), namely, 


used later ($4-1) follows from the 
bability and a comparison of (11) 
that the probability of previously 
remaining within the prescribed boundaries, given that the post: 
tion of the diffusing point at time tis the same value x= 0 as its 
value at t=0, is f40)//,(0) or f (0)/f (0) respectively. For (11) 
and (12) respectively, these probabilities become 


Y Cares (- 0) (13) 


dud 1e (-29. a4) 
o 


To obtain the time taken to reach the boundaries, we note 
that the modified solutions (11) and (12) are probability densities 
in the extended sense used in $2-1, part of the probability being 
lost at the boundaries ag time proceeds, Thus if the probability 
density for the time taken to reach the boundaries is g(t), then 


g(t) = -3 [ reas. (18) 
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This ei ; 
is gives, for example, in the case a->— 00, 


= : 1 (b — mt)? 
go (27079) exp y o (19) 


iun ie ie g(t) is, in the terminology of § 3-3, a passage- 
Bon 1 rs and the methods to be developed in that 
i gos oe e yet another way of deriving it. The distribution 
ditional AnaS problem is compounded of the con- 
sitar istributions for each boundary separately, for each of 
varie E relevant passage time is that conditional on mot first 

M g the other boundary. 
iie yd ewm method. It will be noticed that just as the above 
eo sh pan results provide an asymptotic solution to the 
Cites i or random walk sequences discussed in Chapter 2, so 
the dm y an approximate solution to problems connected with 
trials Meo equation (9) above could be obtained from actual 
pn a example, by simulating the games between two 
client with the aid of random numbers. For example, in à 
and B w problem propounded by C. Huyghens, the players A 
with nce ap ona throw of fourteen and eleven respectively 
ind-i ^^ joe; zy other score being disregarded. Tt is easily 
80 that at p=15. The players started with 12 counters each, 

For i Patt = 12, the stake per g nter. — 
of A wi In problem, equation ( 14)0 obability 

inning (absorption at boundary 5) as 


2.44, 140,625/282,678,677,100 = 0-000864, 
It is thus nearly certain that 
undary @; and the ‘duration 
ximately given by (16) 
reversed; o2=4pq), the 
discrete steps in the 


ame being one cow 
f § 2-1 gives the pr 


t " 
‘ wy solution given by Huyghens. 
of rae will take place at bo 
(with a have a distribution appro 
effect m |a | and the sign of m-p-4 
gamblin the other boundary, and of the 
The = problem, being neglected. " . 
PW er from 100 artificial “game sequences (all of which 
Sabr in A losing) are compared in fig. 1 with the distribution 
bing the agreement, with as few as 12 steps to reach the 
e da in the artificial games, 18 surprisingly good. Conversely , 
Se red histogram gives an approximate solution for the 
ution (16) with these values for its parameters. Of course 
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we should not employ the method in such a case, but the method 
may be useful in more complicated problems, for example, in 
solving partial differential equations representing random-walk 
problems in two or three dimensions, with more complicated 
boundary conditions. The reader interested in this topic should 
consult the literature on this subject (see, for example, the 
National Bureau of Standards (U.S.A.) publication on the Monte 
Carlo Method, 1951). It should perhaps be added that the use of 
random numbers and artificial samples for the empirical study 
of theoretical problems hag long been familiar to statisticians, 
and will be illustrated again in later sections, 


E 
20 
3 
g 
& 
hs 
o 
A 
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Duration of game 
Fig. 1. Frequency distribution of i i ieldi 

: B the duration of play in 100 ‘games’, yielding 
an approximate solution of a diffusion problem whose correct solution is give? 
curve. (Reprinted from Applied Statistics, 2 (1953), 49-) 


by the continuous 


3:2 Markov chains 


We shall next consider how t 
of Markov chains 


à We first consider finite chains 
and suppose (i) that the probability of any change in the state 1? 
the interval t, t+ At is specified by the diagonal matrix 

P(t) At + o(At), 
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and further (ii) that the asymptotic conditional distribution, 
given that a change occurs, is S(t), so that the relation holds 


Q(tg + At |t) = Qt + At | ta) Ole | 4) 
= [L P5) At] Ota | t) + [S63 PCa) A QU | ta) 


Henw +o0(At). 
?Q( |t) EN Q(t, + At |) - Ole lé) 
Ot, At>0 At 
ind [S(t2) =I] P(t) Q(t | t) =R(tə) Q(ts | 5) (1) 
say, where R(t) = [S(t3) - I] P(t). 


Equation (1) is the continuous analogue of equation (4) of 
82-2, and is to be regarded as a set of differential equations for 
each column of Q, starting with the corresponding column vector 
of the unit matrix at time 54. Equivalently, for an arbitrary 
initial distribution p(t,), we obtain the equation for Plta), by 
multiplying (1) to the right by p(t), 


OP(to) _ 
E = R(t.) p(ts). (2) 


If, alternatively, we consider à small change in t, We obtain 


similarly, from the relation 
Qiz | i- At)= Q(t; | t) Qh | t 
e P(t) and S(t) are conti 


—At), 


the equation (we assum nuous with 
respect to t) 
20(t2| 4)  — o, | t) Rb (3) 
Oty 
set of differential equations for 


which is to be interpreted as @ ion 
tisfy the condition that it is the 


each row of Q, the solution to $3 
corresponding row of the unit matrix at time ta: 

The equations (1) and (3) will be termed the forward and 
backward differential equations, and were first derived by 
Kolmogorov (1931). The second or backward equation seems 
less natural for evaluating p(t) from p(t), but, as we shall see 
later, is related to à second powerful method of studying the 
Solution of Markov processes in general, tho first being related 
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to the ‘forward’ equation, in which the last time change from 
t, to t + At is considered. 


For homogeneous chains, Q(t, | 4), will depend merely on the 
difference t,—t,, and R will be independent of t. Moreover 
R will commute with Q, for 


[I+ RAP o(A0] Q(t; — 1) = Q([t, + At]-1) 
= Qt — [t — At]) 
= Q(ty—t,) [I+ RAt -- o(At)]. 


Thus the matriz equations (1) and (3) become identical, though 


their interpretations as vector equations with different initial 
conditions still remain distinct. 


time (this we can always 
though not conversely). In fact we Obviously have the formal 
equivalence Q'— exp (Rt 
set of conditional proba 
vectors of R and the mat: 
their corresponding late 
exp Àn 7 Ào, so that the 1 


rix Q are consequently the same, and 
nt roots connected by the equation 
atent roots of R are such that there is 


e latent roots directly. As | R- AI | - 0, 


—AI)=0 has at least one solution t’, the 
maximum component (in absolute value) of t' being tm say. 


Then for the corresponding element in the last equation 


in l7, —A l= l| X Timt; I 
am 
Stm 7 nam Il (fae en P Tim); 
or l5 —A lS lin, Il 


3-2 MARKOV CHAINS 53 


(where, as before, ||z || denotes the absolute value of any quan- 
tity x). This implies that A is inside or on the circle, centre fmm 
(<0) and radius ||7,,,, ||, i.e. either A—0 or the real part of A is 
less than zero. The possibility of multiple zero roots is not 
excluded, and corresponds to the limiting distribution, which 
now always exists, depending on the initial conditions. 
The discussion of limiting properties of the formal solutions 
Q(t | 0)= e, p(t) = epe, 

of (1) and (2) in the homogeneous case also proceeds similarly to 
the discussion in $ 2-2, but with the possibility of cyclic motion 
now excluded. It will be sufficient to note that the limiting 
Properties for continuous time must be compatible with those 


for discrete time (but not conversely). Since only primitive 


matrices Q are now admissible, we have the result 
lim p,7 Po 
t>o 
esponding to classi- 


always exists. The canonical form for R corr 
fication by final groups is similar to that given for Q in $2:21, 
as may be seen from the relation 

Q(e| 0) 2 I -eR ^ o(c) 
for small e (for a more direct derivation, see Ledermann, 1950). 
The condition for regularity (Po independent of po) in terms 
of the latent roots is that R is of rank k—1 (or the root A,70 
is simple); the condition for positive regularity is that R is 


irreducible. 

Infinite number of states. As in the case of discrete time, we 
may encounter chains with an infinite number of states. Our 
notation still formally applies in this case, but we have of course 
no longer any guarantee that the extension permits mathematic- 
ally valid and unique solutions. This question has been studied 


in great detail by several writers, though it is not usually an 
acute one in practical applications. It will be convenient to 
defer its discussion at present, and we conclude this section on 
Markov chains in continuous time with three examples. 
Example 1. Finite chain with simple roots. In the case of a finite chain, 
considered for simplicity with simple roots, we have & spectral resolution 


for R, i 
R= DAsity (4) 
i=l 
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say, where A,=0 but is included to complete the set of latent row and 
column vectors, so that Zis,t; =I. Then 


k 
R= Zits «t; Po) 
i= 


k 
=s, + Dervs,a,. (5) 
i=2 
If we suppose that N individuals or 
one state to another accordin; 
state by a p.g.f. variable z, 
all N individuals is 


particles independently move from 
8 to the above process, and we label the jth 
writing Z’ = (z,,2,, ...,2,), then the p.g.f. for 


i22 


k N 
(2’p,)* = (rs. Tea, zs) 
—(Z's,)", (6) 
as t> oo. 
As a particular case, suppose 


0 A 1 1 
r=; MEH n-( Joo, -1), 


corresponding to a rate A from the 
Then all individual 


IH(z)  [1-E e-A(2 — 1j, (7) 

Example 2. The 

a simple case of an 
can occur from a 
for examplo, 
but not deer 


general homogeneous birth process. We consider next 
enumerable chain, whore only one-way transitions 
‘ground’ or ‘zero’ state to the one immediately above its 


the total number of occurrences of some event can increase 
ease. This is termed a ‘birt! 


J i h’ or ‘escalator’ process, and was 
first discussed in a remarkable early paper by McKondrick (1914), who 
showed that the negative binomial distribution can arise as a special 
case of it. 
Let the transition probability f i i is 
y from the ith t 1)th state 
A, At + o(At), so that iain 
op; 
E =A Pi ZAP: (i> 0), | 
" (8) 
4 E AoPos | 


or by direct solution 
t 
p:(t)= ex[ Aims AMD, (u) du, 
0 
if p;=0 (420) at £— 0. Hence 


Ag(eAct — e-A, 5 
met nM OM ew) 
Po > Py n Xe Sate 


3. 
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and by induction (the results follow somewhat more simply by the use 


of Laplace transforms with respect to the time) 


Pi ( — 1) AA Aa 
i eit @ 
) 


x X 
jao (Ay—Ao) (Ay A -+ (Ay — Ai) (Ag ja + Qu-A) 


(Th i 
e case of some À; equal may easily be inferred from (9).) The negative 


eene distribution follows from the particular case À;= A+ip; it may, 

the f d bo derived by the generating-function methods used in 

The er discussion in § 3-4 of multiplicative chains. 

+ aie equ sod and sufficient condition for the birth process equations (8) 

Lcd a unique and valid solution for finite t, such that X,p,—-l was 

ditio own by Feller to be the divergence of the sum X,1/A, This con- 
n will appear later as 2 special case of a more general condition when 


‘d i Eo 
eaths', i.e. transitions downwards by one, are also possible. 
The particular case A,2A- ip is 


a pA 3. ‘Contagion’ processes. 
imos contrasted with the simple Poisson additive process À= 
accid as the chance of an additional event (e.g. an 
iti ent) now depends on t. has already occurred. Now 
is woll known in statistics that alter! ative binomial can 
7 ct, the observed 


bi : 
© obtained from the Poisson by 9 heterogeneity effe 
dividual Poisson distributions 


distri : 
hene being a superposition of in 
variable mean A per unit time interval. For the resultant p.gf. 


will then bo 
E(eAte77) = Ma GG D) (10) 
Where if (and only if) we choose 
M,(0)=(1- FO) (11) 
Corresponding to the frequency function for A, 
f(Ayda = p-aae-te MAA (a —1)! az 0), 
then Mite- yeu f Bey (12) 
a negative binomial distribution. m" 
tive binomial, indicating that the 


Es, dual interpretation of the nega D , indica 
Man erved frequency distribution is not sufficient to discriminate between 
ue contagion and heterogeneity, an that a more detailed study of the 
numbers of events for different necessary, has been 
Beneralized by O. Lundberg for the more general d distribution 
2 the type (10) above, which may also tively be generated by 
S agon process. 
Taylor expansion ©: 


Mít(z— =M- +e 


f (10) about z=0 gives 
Lap ann Dit For (13) 
Where M'(0) = 0M(0)/06, etc. Define 

A.) MANIM) (14) 
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and the conditional probability connecting the numbers of events n 
and m at times ¢ and e ( « t), 


(£—s)^-7 MY — t) 


x ————— —— >m). (15) 
Inm(t, 8) (n—m) MA n>m) 
The differential equation 
ĉlnmlt, 
Per?) AMO danl 8) Ass) (16) 


is satisfied by (15) when A 
t+8 (6, ,=1 if n2m,0 
m=0 at s=0, is 


a(t) is given by (14), and also Inmlts 8) > bn, m 88 
otherwise). The distribution at time t, given 


Palt) =Qno(t, 0) =t MOY —t)/n}, (17) 
which agrees with the coefficient of z^ in (13). : 
In spite of this generalization, the negative binomial has a distinctive 


place in that it is only for this compound Poisson distribution that the 


corresponding contagion process can bo made homogeneous by suitable 
choice of the time scale. For if 


then the relation 


Anil) A40) X/A 0) (n=0,1,...), a9 

which readily follows from (14), becomes 
Pata — Pg — pn? 
This equation must hold for an: 


y n>0 and any t>0, so that both sides 
must be constant; this gives 


immigration Process of §3- 
n zero) is such an example. 
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to consider the theory at this stage. We shall include the appro- 
priate results for sequences, the formalism being sufficiently 


similar for the various typical cases all to be discussed together. 
f a particular state 


Suppose we wish to study the recurrence o: 
S for a random sequence, given that S occurred at time zero. 
We denote the probability of S at any subsequent time r, regard- 
less of the states occurring between the times 0 and r, by 


PAS, | So} = {er} 


n-zero for at least 80 
onal on S having occurred at time zero. 


this being assumed no me 7» 0; all such 


probabilities are conditi 
We use the further symbolic notation 


P{not S, | S) = {é}=1- (e P{S,, 85 | So} = (eos 
P{not S, S, | So} = (e es} x (a pr €;) ej) = {es} - {eres} 
etc. Then for the probability of a first recurrence at time n 
we have 
{8 8p «++ San} = {(1 — 60 (17 02 - 
n-1 n-2 n-1 
={e,}= X (eeu) t E x fe es6n) — (1) 


relsertl 


ze 651) en} 


This formula in theory determines the recurrence distribution, 
for the probabilities on the right are theoretically available for 
any validly specified random sequence. For stationary processes 
they will be independent of the choice of the time origin 0. 

If the discrete time 7 is replaced by a continuous time f, and 
We assume that the probability of S occurring m the time 
interval t, £4- At is [e] At+o(At) (and o(At) of occurring more 
than once), the formula (1) becomes replaced by the (formal) 
relation for the recurrence probability dens 


ity at time t, 
: tt 
led- fts. f [ey 6, e] dud — --- 
0Ju 


1 Nd 
=[e]- [ feed du i T f. feuese]dudv—.. (2) 


The formal solution (2) may be useful 
States is non-enumerable, for example, 


of a continuous spatial coordinate; it 1s of n 
modification if S is one of ? finite or enumerable grow 


when the set of different 
if S refers to the value 
o interest without 
p of 
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states, as in a Markov chain, for (ej would in general not be of 
the form [e;] dt, and recurrence on the above definition would 
almost certainly be instantaneous. In such cases we exclude 
instantaneous recurrence or continuation in state S, as we shall 
see presently. 

We define a process to be of the renewal or regenerative type 
with respect to the state S if the occurrence of S is sufficient to 
regenerate the process, probabilities being no longer dependent 
on other past history. This definition is wider than that of a 
Markov process, for which every state has the regenerative pro- 
perty. It is consistent with the ‘renewals’ treated in § 2-11, the 
state S being then taken to be the wearing out of an article and 
its replacement by a new one. In the renewal case, if we also 
assume homogeneity, there is a great simplification in formulae 
(1) and (2). For we then have 


{een} = {e,} {ens}, (eee) = {e,} {e,_,} ON 


etc., and forming generating functions on both sides of (1), 
i.e. writing 


Mele) = EME naeh Me) = E en), 
n= n=l 


we obtain (for 0<z<1) 
g= IL — II? II... — TI/(1 4- IT). (3) 


This formula can be obtained more simply (see Feller, 1950, 
Chapter 12), but it is instructive to make use of the above pro- 
cedure, which will be used again in the case of passage times. It 
should be emphasized that (3) determines Il, in terms of II or 
conversely; the converse situation arises in the renewal or 
replacement theory of an article with the ‘lifetime’ distribution 
specified by II. 

For continuous time, in problems for which a recurrence 
density in the sense of equation (2) exists, we replace generating 
functions by Laplace transforms in t, so that z” is replaced by 
e“, Thus for 


LIN e7?"[e,] du, (4) 
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and similarly for Mp, (3) is replaced by 
Mg - M[(L 4 M), (5) 


with the inverse relation of renewal theory (see equation (6) 


of §2-11 
j M 2 Mg — Mnp). (6) 
ent or state S at average 


The purely random occurrence of an ev 
§ 3-1), so that 


rate (considered in § 2-11 and again in 
o 7 
u={ educ b, (7) 
l a 
gives from (5) Mj — 1/0 +4/#] which is the transform of the 
exponential distribution je. 


State S NEM dm 
D] t 


0 “u 

Fig. 2 
ut a finite or enumerable 
uld proceed to modify 
the term (e) =P, 88Y: 


uius case of continuous time, b 
Vis states as in a Markov chain, we co 
i" recurrence time in (3) by removing 
: presenting continuation in the state S, and measure the 
recurrence time from t=1 for all processes for which we know 
the state at t— 1 is not S. The modified generating function is 


, Ila? (8) 
Me= =p)’ 


and for continuous time this will usually give 2 definite limiting 
function Mh if we write z=e-¥ and let T> 0, However, it is 
easier to use other arguments. The probability P{S|So} = {e} =P 
is assumed non-zero, and we apply @ renewal type of argument 
to the sequence of events (see fig. 2): S from i=0 to u, then 
departure from S and first return at v, then occurrence of S att 
(without conditions on the state during the interval v, t). The 
chance of leaving S in time d is assumed to be Adt, and the 
recurrence distribution for the time from departure to return is 


Biven by the density function f(t). Then we must have 
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whence, taking Laplace transforms, we obtain 


— M5G 1 1 
SU) = Sat ais ee’ 
whence finally Mp=1t+y/A—-1/(AG). (9) 


Passage times. These formulae may be extended to passage 
times from one state S to a different state T. For sequences let 


P(S, | So} = {e,}, PT, | Ty ={f}, 

PIS, | To} = UP P(T, | So} = {h}, 
where it is now important to specify explicitly whether S or T 
is given as the initial state. Consider 


(a) the first passage from S to T', irrespective of the recurrence 
of S. We have 
n-1 
r 


(a = hy) (1 =h) sss a ES hai) hn} = {ha} pe x {hh} 


=1 
n-2 n-1 

tA E (Ahi... (10) 
r=ls=r+1 

For homogeneous renewal or regenerative processes (the re- 


newal property now of course being assumed with respect both 
to S and T), (10) becomes 


02-3 000.3 SS 030.30. 3 


or I7 = r5 ( +1177), (11) 


where II27 is the (unconditional) first 


passage time generating 
function, and 


TIST = > {h,} 2, 
r=1 


etc. If we alternatively require 

(b) first passage to T', without recurrence of S, i.e. the passage 
time to 7’ measured from the last occurrence of S, then the 
formulae become somewhat more complicated. We must now 
consider 


{(1—hy— ey) (1 —hg— e2) ... (1 — hp-1— en-1) hn} 
n—1 
=tha}— X {(h, + &) hn} 


XXX Üret ever (12) 


r=1 s=r+1 
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Using such relations for homogeneous renewal processes as 
(h.e) = {h} (gs (^. 6565) — (0j {92-1} (e. 
we find that 
i is% 
mg 7x TI7) (+ 1155) - 8179” (18) 


enerating function without 


where 1137 is the first passage g 

intermediate recurrence of S. 
These formulae also exten! 

tinuous time with densities [e]. Thus 


to (11) is MSP (qh) = MST +M”). 


It is perhaps useful in the case of (13) to check the completely 
analogous formula by deriving it by an alternative argument. 
The recurrence of S from 0 to t may be split up into two con- 
tingencies: (i) recurrence of S conditional on no passage through T, 
and (ii) first passage to T conditional on no recurrence of S followed 
by first unconditional passage from T to S. This readily gives, 0D 
taki 

ing transforms, M$- c3 MEMES, (15) 
e function for S as above 


d immediately to the case of con- 
the formula analogous 


(14) 


where C$ is the conditional recurrence 
defined. 
Similarly, the passage from S to T 
passage conditional on no recurrence o 
conditional on no passage through T follow 
passage. Hence also 
£ ase = Mgt + ORM 
we find the analogue of (13), 
MST 
= 17 
US = ps M5 MTM (17) 


may consist either of 
f S, or recurrence of S 
ed by unconditional 


(16) 


From these two equations, 


and also the further formula 
MER (Ler 
CR- ir 
+M) (+ 
course has its analogue fo: 
7 — [IS7TI75 
ns -ST (19) 


MTT — MSTMTS 
TH (18) 


r discrete time, 


Formula (18) also of 
nss +I” 
aan) 0+ 
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this being alternatively derivable by the method used for (13). 
It should be noticed that even if M$(0), M}7(0) and MZS(0) 
are all unity (see §3-31) the equations (15) and (16) imply that 
C#(0) and .M$/ (0) are less than unity. This merely means that 
any recurrence or passage may not satisfy the additional 
requirement. To convert formulae like (13) or (18) into valid 
conditional distributions, we may always divide by the total 
probability sum I127 (1) or C(0) respectively. 
State T 


eges cemere ei 

: ' 

' ' 

State S L i 
0 t 


* ? 
Fig. 3 


Corresponding formulae may also be obtained if the time is 
continuous but the number of states enumerable. For example, 


for first passage regardless of intermediate recurrence we obtain 
the relation for pg(t) = (hj), 


t v 
Psr(t) =f Si etse —u)prr(t—v)dvdu, 
from the sequence of events (fig. 3): S from 0 to u, then (uncon- 
ditional) first passage to T' at v, and also occurrence of T at t 
(without conditions on the state during the interval v, t). In 
this equation prj4(f) 2 (fj), fsr(t) is the first passage density 


from S to T', and we now write for clarity Ag=A. For the trans- 
forms we obtain 
1 


GS (yp) = (i+ FAs) MP GTT), (20) 
or MP (ih) = (1+ WlAs) GST) GTT). (21) 


For conditional recurrence and passage times we may employ 
a similar argument to that used for the relations (15) and (16). 
The only change is the insertion of a lifetime in S after the con- 
ditional recurrence of S and a lifetime in T after the conditional 
passage to T. Hence (15) and (16) are replaced by the relations 


M5 - Cg MO V[Ap] MFS, (22) 
MS = Col + JA] MFF + ME, (23) 


f For formal simplicity it is written as a density, though in general the 
distribution will have & discrete component at zero, corresponding to a direct 
transition to T after leaving S. 
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where C, denotes the transform of the conditional recurrence 
distribution. Solving these equations, we find 
GST 


RM A 

ME = -xgssgrr ETO) (24) 
GTT 

(25) 


uw pe 
Cg-l* As AglGS8G77 = GSTGTS] . 
The consistency of formulae (9), (21), (24) and (25) is checked 
ample of a Markov chain with the 


by the following elementary ex 
transition rates between its only two states S and T given by 


the matrix 

—1 2 

(1-9 

Here 
(234367. {g} = 20. =3(1-e™), (fyagt ie, 
2+% 2 yl 
QSS =  ——— ys =9GST —— 7» TT — | 
ay ee "9E perv) 


We find on applying the above formulae 
Ma-20-W) Cr=9, MS? = M3? =l, 
ecurrence time for S is 


correct, as the ri 
recurrence is 


h lifetime in T, no 
T, and conditional and uncon- 
and take zero time as 


which are obviously 
identical in this example wit 
possible without traversing 
ditional passages from S to T' are identical 
no other state is possible. 

Finally, we may require conditional first passages of other 
types. Thus for three different states S, T, U, say, let us consider 
first passage from S to T, conditional on no intermediate passage 

f the previous conditional 


to U. This is merely an extension o 
passage time (for which U 25). and an obvious extension of the 
previous arguments may be made. Thus formulae (15) and (16) 


generalize to 
MẸ = Mg e d - 
agr - ur + MME uis 


whence now 
ST rm Mer MP, (27) 
MF Me ` 
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Application to diffusion processes. It should perhaps be 
pointed out that these formulae cannot immediately be applied 
to a normal diffusion process, as, for example, represented by 
equation (10) of §3-1. This is because of the absence of a finite 
‘velocity ’ for such processes (see Chapter 5), and correspondingly 
of a finite density of occurrence of a state S=x in time, even 
although the probability density with respect to the continuum 
of states x is finite. However, it is not difficult to make the 
appropriate modifications. Thus as one possibility we may re- 
interpret MST, say, in formula (14) as the transform of the space 
density function, and consider the first passage time not to the 
‘point state’ x, but to the interval (v, -- dz). An integral equa- 
tion argument then leads to the result M77 = MS7/M?T in place 
of (14). Formula (27) correspondingly becomes 


MS! = (MST MUU — MYT MSU)|(MTT MOU — MUT MTU), 
From equation (10) of $3:1, we find (for S 2 0, T =b) 


; 1 bm b (2j : lm? 
st mpera Lan (v v 525 b> 0), (28) 


whence also MTT — | (2o). (29) 


These formulae give, for example, 


MST bm bA(2y' 
m= a- ay ’) (b » 0), 


in agreement with equation (21) of 82-1 (y= — i$). 


3:31 Ergodic properties. In the relation II p= IT/(1-- IT) 
for renewal sequences obtained in the previous section (equation 
(3)), valid for 0 & z « 1, it is evident that TI x also exists for z— 1, 
but may not necessarily be unity; in fact, the event S is only 
certain to recur if TI p(1)= 1, the recurrence of S being otherwise 
uncertain. II,(1)<1 implies (and conversely) that II(1) «oo, 
which is thus an equivalent condition for S being uncertain 
(transient). 

When & is certain, we assume for convenience that S is not 
periodic, i.e. does not recur only at times t, 2t, 3t, ... (t> 1) (in 
this latter case we can always regard 2! as the argument of the 
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generating functions instead of z). Then a theorem given by 
Erdés, Feller and Pollard (1949) (cf. also Kolmogorov, 1936) 


establishes the ergodic formula 
lim {e,}= 1/4, (1) 
reo 
where x is the mean [ôI plOZ]en1 This formula remains true if 4 
is infinite, when {e,} > 0 as r>. 
In the case of continuous time with a time density of occur- 
rence [e], we may deduce from equation (5), §3°3, that Mj(V) 1 
as y —>0 if M(p)>~, and S is then certain to recur. The rele- 
vant theorem} for such a process (for which no periodie case 
arises) is now one given by Blackwell (1948), namely, that if the 
expected number of recurrences or renewals in the interval 
(0, t) is U (t), and wis the mean recurrence or renewal interval,then 
lim [U(t4- h) - U(0] 7 ^]p (3) 
too 


for every h>0. This formula also remains true if is infinite. 


In the present case, [e] - dU (t)/dt. 

In the case of continuous time but an enumerable set of states, 
so that equation (9) of $3:3 holds, we note that G(0) —oo still 
implies M4(0) — 1 and alternatively, G(0) « oo implies M;,(0)<1 
(and conversely). To obtain a formula corresponding to (1) or 
(2), apply Blackwell’s theorem to à ‘renewal’ defined for the 
immediate purpose as the complete lifetime plus recurrence 
of S. As contributions to {e,} can arise from renewals of S in this 
sense occurring at & previous time u and surviving in state S 


to time t, we have the relation 
t 
(o fea (3) 
o 


U(u) is now the *convolution' of the lifetime and recurrence 
distributions, each with a density, so that U(t) is still differenti- 


able. It follows that (e) —> 1/(As/#) as t>, where y is the sum 
of the lifetime mean, 1/As, and the true recurrence mean //g» say; 


that is, 4 — (p - 1/As and 
lim (e) - 1/0 + Asitg)- (4) 
t>o 
g both results (1) and (2) 


this class, coverin 
Smith (1954). 


T A more recent theorem in 
has been given by 


(and also the limit of (e) in (3); 
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We may now indicate briefly the implications of the above 
results for Markov processes, for which every state has the 
renewal or regenerative property. We consider only irreducible 
processes, for which any state can be reached with non-zero 
probability from any other state after some interval; processes 
with 'absorbing' states are thus excluded. Then for Markov 
chain sequences} with a state X; certainly recurrent, we may, 
making use of an argument due to Feller, choose intervals of 
minimum sufficient length to allow passage from state X; to 
another state X j and back to occur, and write 


Gilt +71 72) 2 85/71) Gilt) qis(74) = si (5) 
o 


vo quU Ti+ To) 2 diio) Celt) qs (1) nog), 


where c, f >0 (here, as in § 2-2, g;,(t) denotes P(S; att | S; at 0)). 
The relations (5) imply that q,,(t) and q;;(t) have the same type 
of limiting behaviour, corresponding to the three alternatives: 
X; also certainly recurrent, with (i) q;;(!) and q;;(t) both tending 
to zero, (ii) qą(t) and q;;(t) both tending to non-zero limits, 
(iii) X; and X; both periodic, with the same period. (To show 
that X, is also certainly recurrent, sum equations (5) over t, and 
use the result that X; certainly recurrent implies the divergence 
of Z,q,.().) 

Moreover, from the relation II27 = IIS/(1.4 TITT) (equation 
(11), §3:3), with S=X,, T2 X,, as II27(1) <1, we must have 
II$7(1) «oo if II77(1) «co (the case of uncertain recurrence). 
Alternatively, for certain recurrence we must have II27(1) 2 1, 
hence II57(1)/II77(1) — 1. In this last case (if also non-periodic, 
so that g;;(t)—>1/4;) it readily follows from the probability 
relation equivalent to equation (11), $3:3, that as t increases 
qj;(t) also tends to 1/4. This is still true if i — oo. 

For a finite or enumerable set of states but continuous time, 
a similar argument makes use of equations (9) and (21) of 83:3, 
together with the relation (4) above. The periodic case does not 
now arise; and the intervals 7, and 7. in (5) may now be any non- 
zero intervals. It has of course been implicitly assumed in this 
recurrence theory that a valid Markov process in continuous 


T For & more detailed and exhaustive discussion of this case, reference 
should be made to Feller (1950, Chapter 15). 
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time has been specified; the question of whether this is achieved 
by means of differential equations of the type derived in $3:2 is- 
returned to in the last section of this chapter. * 

For continuous time and a continuous infinity of states with 
finite time densities of occurrence f;:(t)=[e], falt) 9 [4], etc. 
(excluding periodic processes), we may for certainly recurrent 
X; replace (5) by the relations 


teh 
Silt tT, T3 Fh) dr 


2 fl] fira + ©) fair h—u—2) fa(ra * v) dudvdr, 
teh (6) 


Falt - 7, 4 T3 +h) dr 


>Íf fura o) fult + h—u—v) fji +u) dudvdr, 


he interval (0,) for u and v. 
type of limiting behaviour for 


ŞT = MST (L +4717) 


where the integration is over t 
These relations ensure the same 


iis PUER 
ji fa(v) dv and | fyw) dv. Theformula M 
t 
(equation (14) of $ 3:3) ensures that since 2/27 (0) <1, MTT(0) <0 
implies MST(0) «oo. Alternatively, if M77(0)—00, so that 
t+h 
recurrence is certain,we must have M$7(0) — 1 and I, fj(u) du 
t+h H t "n . 
and | f(u) du have the same limit h/y; (which is zero if 4; is 
t 


infinite). 


3:32 Alternative method for Markov chains. An alter- 
s for obtaining recurrence 


native to the use of 'renewa 
distributions in the case of Markov chains with a finite, or 
effectively finite, number of states is sometimes convenient for 
direct calculations. It consists of modifying the transition or 
conditional probability matrix by prohibiting all transitions out 
from the specified state S. This automatically ‘freezes’ the state 
S once it occurs, and the probability of any lifetime in the com- 


posite state ‘not S? (which determines the recurrence time back 
to S) is identified with the probability of still being in ‘not S" 


l' argument: 
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for the modified process. Thus for discrete time and a matrix Q 
of transition probabilities, the modified matrix is, say, 


where for convenience the top left-hand corner corresponds to 
the specified state S. The recurrence time 0,, say, will be defined 
analogously to the definition for continuous time, as the time to 
first recurrence after having left S (i.e. continuation in state S 
is reckoned as the ‘lifetime’ of S,andno longer counted as recur- 
rence; this corresponds to the modified recurrence distribution 
II in equation (8) of §3-3). The probability column vector po, 
given ‘not S’ at time zero (and S at time — 1), will have the form 
Po— (0; y’), say. Hence 


li x /0 X, y 
Qzp,- |- es [b], 
0 | Br ly Bry 


and the cumulative distribution is 1 — [B” y], where [y] denotes 
(in the present context) the sum of the elements of the vector y. 
The generating function is thus given by 


Ik= X, [Br B») ne-r] Œ 


I-Bz 


If p, is not conditional on S at time — 1, the above procedure 
determines the distribution Ik, say, of another recurrence time 
02, say, which is recurrence to S, measured from a time for which 
the state is not S. For example, in the case of only two states, let 
B=1-A, and with y=1 for either 0, or 62, we confirm the 


trivial solution I= 11%, = Azji 1—2)2]. (2) 


Similarly for continuous time, let the transition matrix be 
of the form Q — I - Rót - o(ót). Let 


Ci 


be the modification of R in which the first column has all zeros. 


——— on SR 
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The cumulative recurrence distribution for 0, becomes 1 —[e“y], 
and the Laplace transform 


argon [. e-¥*[ Ce y]du 


E Cc 
-lc-5y y] : (3) 
Again for the case of only two states, and C= —À, 

M3) = Mp) = 10 VIA), (4) 


where Mh(y) is the similar distribution for 0s. 

The general type of recurrence-time distribution for finite 
Markov chains in continuous time will obviously be a sum of 
exponential distributions if C has simple roots; if multiple roots 
exist, the expression [e" y] will involve components of the form 
P(t) e, where P(t) is a polynomial in ¢ (the distribution is thus 
expressible as a sum of y? distributions). 


3:4 Multiplicative chains 

We return now to the discussion of important types of Markov 
chain begun at the end of § 3-2. It is very illuminating to consider 
directly the equations for multiplicative chains when the time 
becomes continuous. It will be recalled that our general equation 
for multiplicative chains in the case of more than one type of 
individual was expressible in the dual form (end of § 2:3) 


IL, (2) = 1L, ,(G(2)) = Gn -a(2))- 
Let us now postulate that as the time interval ôt becomes small 


G(z)-z- (2) ôt 4- o(0t), 


then we see that in the limit we must have 


am(z) (E99 


ere g” denote the components of g. 


for k types of individual, wh 
d form of the recurrence relation, 


Alternatively, from the secon 
we obtain 
2m)... gama). 2) 
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Inthe particular case of one type of individual, these reduce to 


Ae) soin (2) (3) 
gai PA L gay. (4) 


The equations (1) and (3) correspond to the forward equations 
of § 3-2, equations (2) and (4) to the backward equations. From 
(4) we abate immediately 


Il, du 
zi gu) (5) 
as II,(z)=z. For example, if 
g(z) =A(z2—2) + u(1 —z), (6) 


representing a ‘birth’ rate A per individual and a ‘death’ rate by 
we obtain 


Le -f (Si mt) = 1 ds —1 
(u) JA-—p\u-1 Au Hj] A-p re =p’ 


whence we find from (5) 


T(2)= [X T an 1|/ T. d 1]. (7) 


starting from one individual, or II7(z) from n. In the case A=, 
this becomes 
=(At=1) (2-1) 


mo- ee (8) 


If we expand log II,(z) in (7) or (8) in powers of 0—loge, we 
obtain the cumulants of the number N, of individuals at time t. 
In particular, it is readily found that the mean and variance are 
given by 


m= et, 
A Ho pipa 
eA] (A ^ 9 
T Der [ ] A+z) (9) 


2At (A=p). 


3:4 MULTIPLICATIVE CHAINS 71 


In the case A=, we have an example of a process with constant 
mean but with ever-increasing variance. For the probability 


of extinction, we have 
pele! — 1)/(Ae970!—u) (A ary, (10) 
At/(1 4- At) (A=p). 
It will be seen that as £—oo, the chance of extinction tends to 
one for A<p, and p/A for Az p. The corresponding limiting 
chances for n initial individuals are one and (u/A)". 


Irn(0)— 


15 


5 


Population size 


un 


50 


0 1-0 20 
T 


realizations of a 
1, each beginnin 


0 
i imple birth-and-death multi- 
Fig. 4. Th independent a simp th-and um 
plisativo Saa With A=h= g with five individuals at ¢=0. 
i iabili different realizations of 

T the variability between eali s 
Medis a number of artificial realiza- 


this process, especially when A —/^ ealiz 
tions were constructed for the case A=4=1, each beginning 


with five individuals at t= 0. Three are shown in fig. 4, of which 
one achieved extinction before t= 4. The method of cem 
used tho fact that after any event leaving N, pes ua is . 
chance of the next event (transition) in time dt is WM " 10 

the time until the next event is a random variable T following 
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the exponential distribution e-7T/"^dT|m, where 1/m = N(A +x). 
When the event occurs, the chance that it is a birth is A/(A+ 4). 
(For the estimation of A and x from such a realization, see 
Chapter 8.) 

The more general result (5) (for one initial individual) enables 
the limiting distribution as too of N,e^', where E(N)) - e", to 
be investigated. We must have for L(0) «lim L(0), where L,(A) 
is the m.g.f. of N,e" (v » 0), 


L(0(14- vAt)) = L(0) + g(L(0)) At + o(At), 


or v68L[00 — g(L(0)), (11) 
L 
whence Eid = lage -- constant, 
gw) v 


where the constant may be determined from the condition 
(9D[00),., — 1. For example, when 


g(2)=(Az—p) (2-1), v-A-y, 
L-(1—Og40)/(1 — CA0), 
where C = 1[v, so that finally 


L-(1—460]v)|(1— A0[v), (12) 


as could of course have been obtained from equation (7). By 
writing (12) in the form 


£ y 1 
p» i) Tad 
we see that it has a discrete component [A at the origin, corre- 
sponding to the chance of extinction, and a remaining con- 
tinuous component with exponential distribution. 

The non-homogeneous birth-and-death process. The solution (7) 
for the homogeneous birth-and-death process could alternatively 
have been obtained from equation (3), which in this case is 


eI) | T(z) 
Oz 


at (2-1) (Az—p) (13) 


It will be clear from the structure of (3) that equation (13) still 
holds if A and yw are functions of t. The solution of (13) is 
TI,(z) 2 V(Z), say, where Z(t) is a solution of the equation 


dz|dt + (z — 1) (Az—p) — 0. 
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equation becomes, with z— 1 = l/u, 


du|dt 4- (u— 4) u—À 


with " t 
solution ueP =f Ae? di + constant, 

0 
Where 1 

p= i duy 
Hen t 
Sa T(z) = v( 2i - foveat 1 
where, from II,(z) —2, we find ¥(z)=1+ 1/7, whence 
1 
ae E — p... (14) 


eP[(z— 1) - [pera 


dall (1948), who deduced from 


This solution is due to D. G. Ken 
dividuals at time t, tends to 


E that II(0), the probability of no in 
nity as t increases, if and only if 


t t 
esf Aerdt=1 + [erat 
0 o 


as (o0. 

A particular case of the above process was considered by 
Arley (1943) in his discussion of physical cascade showers, where 
cosmic-ray particles give rise on impact with matter to secondary 
particles, which in turn may give rise to further particles, before 
such particles lose their energy and are absorbed. Ina simplified 
model for this process he assumed that A was constant, but that 
the chance of the ‘extinction ofa particle increased linearly with 


the time. (This is not an exact treatment, which would have to 
be based on a probability ‘balance-sheet’ of the previous random 
impacts of the particle; of. $3:41.) We thus assume that y(t) is 
of the form yt, where / is constant. Then p= dut — Àt, and 


MCE) a sar Ex 
mo-i 22i - [ae CAE (15) 


w the Gaussian law 


The mean growth m is found to follo 
m= erit, (16) 
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while for the variance v, we find 


t 
v= e-ta] — gxi-bat 4 zueocien ( glirt- dr, (17) 
o 


Two or more types of individual. In the application to cosmic- 
ray showers there are actually two types of particle, photons 
and electrons, to be considered (positive and negative electrons 
not being distinguished). These give rise respectively to pairs 
of electrons and photon-electron pairs, and (writing for con- 


venience of notation w, z for 21, Z2) we have as a somewhat more 
realistic model aul, 25 t) - Aq — w) dla il 

Jol, 2; t) — Aq(zw —2) + ni, (1 — z), 
where w corresponds to photons, z to electrons, and the death- 
rates 4 and ji (A and y being in general dependent on t) are, as 
before, an approximate representation of the loss of energy of 
the particles in time due to previous collisions. The resulting 


equation 
EET y gp, (18) 
has been solved for small t (i.e. small distances) when the ' death- 
rates’ / and jJ, can be neglected, it being further assumed 
that A,=A,=A (constant). The auxiliary equations to (18) then 
become dz dio 
di^ A(ew—2)=0, di^ ^9 -w)-0. (19) 


It is now p 


ossible to obtain a solution, making use of the trans- 
formation 


ze-M y, we-M y, T =e, 
whence u= A [sinh [A(T — By], v—Acoth[A(T— B)]. (20) 


Solving equations (20) 


for A and B to obtain two independent 
integrals of (19) and m 


aking use of the initia] condition 


Iw, z) =z (for one initial electron), 
we find that 


a e (ut 22) . 
TL(w, 2) sinh (e -D J(u? 2) + coth To a y (21) 


and similarly if Ij — w (one initial photon) 
IT,(w, z) =e% Jw? — 22) 


x coth ((e7^ — 1) J(w? — 22) + coth-1 [w/4(w*—22)]. (22) 
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The moments may be obtained by differentiating (21) or (22) 
with respect to w and z, but it is usually simpler to differentiate 
the initial equation and solve directly. In fact, for homogeneous 
chains, we derive directly from (1) and (2) the first and second 
factorial moment equations 


Omi, A 


E TA az 
ami ag" og" Qgh e» 
jk ant E i 
eS em [e] enti 
and 8mi p 9g! 
E nu. zl j 
(24) 


i T 29i 
= = DFA +m} mi | : 
In these alternative sets of equations, mj and mj, refer to the 
first- and second-order factorial moments 
ENO}, ENON) (J+), 
E(N(0UN()-1) G9. 

for an initial individual of type i, the square bracket denotes 
that the expression contained in it is to be evaluated for z=1, 
and the summation convention is to be understood. 

To illustrate these general equations in the case of the above 
example, we put A=1 for convenience and, more consistently 
with the above notation, 

g (24,22) = Jal, 2) 22172 — 7» 
gez Za) =g (w, z) m) zi- o 
Then equations (23) give 
ami . ams 


T cami gp iT 
and mi o 4 (Ol [mi 2mi 
at (25) 
amigo} |i -1 2] | min} +] i |, 
ot 
mis 0 2-2 m$, 0 


ES 
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while (24) give 


am? am 
Í mA 3 e 250 Es mo 
à T Gt Ed 
and 
Q 
m m2 + (m2 mif +m? mi), (26) 


m$, a (2) (0,0) 
3t = (2m) — mR) + 2m? mg". 


The solution of either set, remembering that m{,=0 initially 
for all i, j, k, we find to be 


mP-iQé-e9),  mP-(24— 2e), 


mP-yé-c*, — mP=Hel-+ 2e-%), 

mi get get du — Hee det 

mil- $e fet det te, "m 
mil - Bet Fett du det — Jor, 

m Be Be-t_ fet L4et_ 2e-2t, 

m2) = ieu Bety get Bel, 

mo = $e a $e aý Ze = Hed 4 LR 


3.41 Theeffect of immigration. The multiplicative chains 
treated above strictly do not allow ‘ immigration' into the system 
from outside (‘emigration’ may be treated synonymously 
with ‘deaths’). However, as in the case of multiplicative 
sequences, the effect of immigration is readily included; for 
simplicity we consider problems with only one type of individual. 
We assume that immigrants enter with rate coefficient v(t). Let 
the p.g.f. solution at time t (27) with one initial individual at 
time 7 in the absence of immigration be G(z, t, 7). Then, since the 
chance of entry of one new individual in the interval 7,, r,+ Ar 
is v(r,) Ar + o(A7), the complete solution is, from the assumption 
of independence of the individuals, 


G^(z,t,0) lim II{1 + v(r,) Ar (G(2,t, ,) — 1)}, 
Ar->0 (r) 
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where the intervals cover the total period (0,¢) and IJ, (z)=z". 


Hence i 
II,(z) = Q” (z, t, 0) exp (f. v() (G(z, t, 7) — dr) : (1) 


For n=0 and GQ(z,t,7)=z, this gives the familiar Poisson 
distribution t 

IL(z) - exp le - nf »(r) ar| . (2) 
If we put for G(z, t, 7) the solution 

er t 7-4 
= = ?' dt 
Gite, 4,7) =14 la 5 f» ] 

of the birth-and-death process, where 


p [uat 


birth-death-and-immigration 
en A, x and v are constant, 


we obtain a general solution for a 
process. In the homogeneous case wh 


we find (for A+ 0) 
(A=pyP uT- 1) - (A7 =A) 2" 


IL(z) = [AT — 4 — XT — 1) gem (3) 


where T'=c-!, When n=0, this is a negative binomial dis- 
tribution, and in particular it includes the negative binomial 
distribution obtained at the end of § 3-2 (put 4— 0, and note that 
the chance of an increase by one in At when » individuals are 
present is (Àn +») At+o(At), the form of A,, assumed for birth 


processes’ in the more general sense there defined). 

The effect of immigration may be, as in the case of sequences, 
to produce a stable limiting distribution for N, Thus suppose 
G(z,t,r)>1 as t increases; an equilibrium distribution then 
exists if 4 

lim | v(r) (Q(s,5,7) - D) dr 
t—9J0 
exists. For a homogeneous process, i.e. v constant and 


G(z,t,T) =H(z,t-7), 


n. -exp|» ffe - n4. (4) 


we then have 
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For example, for “>A, equation (3) gives 


w— Az A 
nd | 


On the other hand, if A=0, it is easily shown that equation (3) 
is replaced by 


T1(2)=[1+ T(— 1) exp {v(z— 1) (1 — T)]a], (5) 


where T'—e-/, this tending to exp (r(z— 1)|u] as too. This 
shows how a limiting Poisson distribution arises when deaths 
(or emigrations) are counterbalanced by immigration; alter- 
natively, a negative binomial distribution can arise if births also 
occur. The process (5) is an important one in practice, for it 
may approximately represent a situation where particles move 
independently in and out of a certain small volume under 
observation. If we assume that immigrations take place at 
rate v, and emigrations are proportional to the number N, inside 
the volume, the solution (5) follows. It has been used as a model 
for colloidal particles in Suspension and also for the movement 
of spermatozoa (see Chandrasekhar (1943) and Rothschild (1953) 
respectively; see also §§ 5-21, 6-31, 8-3 and 9-13), 


3:42 Point processes. Many stochastic processes, and in 
particular multiplicative processes, arise in physics and biology 
where we have to deal with particles or individuals distributed 
in à continuous infinity of states. The elementary birth-and- 
death processes discussed in $3-4 are unrealistic whether applied 
to population growth or to cosmic-ray showers, for the behaviour 
of individuals depends on their age x and of particles on their 
energy e. We may regard this parameter as à new continuous 
parameter of a stochastic process, in addition to the time t, but 
from the present development the natural viewpoint is of an 
evolving continuous set of random variables. 

The immediate difficulty is that only probability densities can 
be attached to particular values of the age x or energy c, and not 
non-zero probabilities (stochastic processes of this type will be 
called point processes). Integration of the probability density 
can only yield the first-order expectation of the numbers of 
particles or individuals in prescribed ranges of e or of z, whereas 
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we require also to study fluctuations in these numbers, To 
overcome this obstacle in the cosmic-ray problem, Bhabha and 
Ramakrishnan introduced higher-order density functions, called 
by Ramakrishnan product-densities. A somewhat similar pro- 
cedure was introduced independently in the population problem 
by D. G. Kendall, who noted its connection with the use of the 


characteristic functional. 
Let N (x,t) represent the number of individuals at time ¢ with 
e X <x. We assume that the density relation 
PACA) dx= E(dN (z,t)) (1) 
+ 6a is 


the parametric valu 


exists, such that the probability of one individual in x, ¥ 
fix -- o(8x), and the total probability of more than one, 0(d2). 
As already noted, the integral of f,(x) over yields only the mean 
number of individuals in the range of integration (the addition 
law of probability does not apply in its simple form, for the 
events are not in general mutually exclusive). 

However, if we also form the product-density of order two, 


fo(%,%q) dx, da, = TEXAN (24) AN (%_)} (1 ty), (2) 


as the simultaneous probability that an 
and another in 2g, tg + das, provided 
dless of the 


this may be interpreted 

individual is in 2,24 4- da, 

the two differential elements do not overlap (regar 

numbers of individuals in other ranges of x). When 2; ^?» 
EAN (ey) P}= BEAN (3) i) ds 

eracy, we obtain the formula ' 


(3) 


In view of this degen 
same. Bane aN 


= fiends f S me du diy. (4) 


The density-functions f, and fo may consistently be thought of 
as factorial-moment densities, for (4) is equivalent to 
2th 


z+h 
BANAN- =| [modd © 
order densities, degeneracies 


imi higher- 
Wa may similarty yp e f the differential elements 


occurring whenever at least two O 
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coincide. To perceive the relation between f, and the rth factorial 
moment of N in a finite range of x, it is sufficient to note that the 
contributions (> 0) to f, and the rth factorial moment of N must 
vanish together when the number of individuals in the given 
range of x is <r—1. The general relation between E([AN (2))) 
and f, is thus 


EAN (£) = X re " [ras sss its, (6) 
8-1 
where the 7c, are obtained from the relations 
v= X "en(n—1)...(n—541) (n1... (7) 
8-1 


The c, in terms of the ‘finite differences of zero’ are A*07[s! and 
have been tabulated for r, s = 2-25 by Stevens (1937-8). 

In the case of individuals with no interdependence of any 
kind between different z intervals (a fixed number n of in- 
dependent individuals does not permit this property, for if a 


particular individual is not in one range it is more likely to be 
in another), we have 


F Atay ones 2,) =f, (24) fig) din), 


whence the rth factorial moment of AN is given by [E(AN])]', à 
property of the Poisson distribution. 
To see how these densities are related to the use of the cha- 


racteristic functional, let us define the complete characteristic 
functional with respect to N (a) 


Cigla) t) = Bfexp| if pew anu) |). (8) 


This functional contains automatically all the above moments; 
for example, putting ¢(u)=¢ for u between x and +h, and 0 
otherwise, and expanding in powers of ¢, we have 


O(9)=1+ iP E{AN} — }¢2B((AN)}.... 


It is convenient to relate this formalism to the previous equations 
for multiplicative chains by writing ió(u)zlogz(u). Consider, 
for example, a cascade shower problem where one type of par- 
ticle of energy e gives rise after ‘collision’ to two new particles 
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of the same type, with energies e, and ez, the probability for an 
interval dt being 


w(&, €a | €) de,de,dt (E1 < € € + €» <€). 


(This is the ‘nucleon cascade’ problem, and as it involves only 
one type of particle is somewhat simpler than the electron- 
photon cascade shower already referred to, which may, however, 
be treated by similar methods; see, for example, Ramakrishnan 
(1952), Bartlett and Kendall (1951), and further literature 
referred to.) Referring back to the equations (1) and (2) of §3-4 
for more than one type of individual, the function $(z) now 
generalizes to 


g(2(w) | e f wen ese [ele1)2(e)—2l6)]derden — (9) 


This immediately yields the equation 


oll 
tle) x Jte & | e) Melu) | e) Hau) I 

—T1,(z(u) | €)] de, des. (10) 
The 'forward' equation is not quite so neatly generalized, but 
could be expressed in ‘differential’ form 


dII,(z(u)) = Mezv) + gzu) | v) dt) — 11,29), (11) 
v) is given by (9) with v replacing €. This gives the 


where g(z(u) | (1) of $3974 


formal extension of equation 


emo 19) fff fecal tede t de de) 


all ,(2(v) | e) 2 
| e0) dv LZ (12) 


ticle in the range v, v-t dv is 


where, as the probability of a par 2 
3 formally defined and written 


O(dv), the ‘functional derivative’ is 

as OTI(z(v) | e)/[0z(v) dv] (cf. Hopf, 1952). án 
The first and second product-density equations may for 

be written down. It is convenient to define the symmetric: 


function wle, ca | 6) = Mw (616 |e) +(e €1| €)) 
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for which all values of ¢,, e, (such that €1 - €, € €) are relevant. 
Then if 


footer cele) de, 7196.19. 
[fete €; | €) de, de, — W (c), 
it is readily found that 
AELS [tuo Qe | du We) 519), | 


HELIS finvo Qs |u)du+ [fazu | e) Oly |u)du 


+2) file | €) fe, y | u) du— (W(z) + W(y)] fale, y | A 


a3) 
corresponding to (12), or, alternatively, 


ð 
219. fy |) Q(u | ) du — W (e) file D 
MEO hey |u) 9(« | e) du — We) fola,y|e)p (14) 


+ 2 [rice |e) Aly | v)e(u, v | 9) dude. 
These equations may be solved if it is assumed that 
9(6,, Ea | €) de, de, =w" (s), N2) dy, dijs, 


where €, — em, e,— €». For example, taking the Mellin trans- 
forms of f, and f, in (13) with respect to x and y for given e, so that 


yi(5)) = I Silx|€) ada, (31, 8) = fne. y |6) à ty de dy, 


and choosing for convenience the scale of t so that o(1,1)— 1, 


where P 
(81, 8) = ffe (Mis Me) 1) 2° dij, dha, 
we have 
£z (201, 8, -1)", 


(15) 
a (2241, 8, + 2a(1, 8) — 2) + 225, 5,)y (s +5 — 1). 
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This leads to the solution 


Yı = E17} e-l17220, 514 
2a(s,, Sa) esrts2-2{e-t1—2atl, 553-0 g (27220, 5))- 240, 5910 (16) 


yas 
1—22(1,5,) — 2a(1, 8) + 2«(1, 8, - 55 — 1) 


The higher density transforms y, may be obtained in succession 
by an extension of these methods. 


3.5 General equations for Markov processes 

In the previous section on multiplicative chains in con- 
tinuous time, the primitive iterative relations for simple multi- 
plicative chains in discrete time led formally but naturally to 
differential and integro-differential equations of various types. 
Multiplicative chains are examples of Markov processes, and the 
classification of these various equations will become even clearer 
if we examine the character of the ‘forward’ and 'backward' 
equations for Markov processes in general. 

Let us denote the ‘state’ of the process by X(t), and form the 
characteristic function Cf) of X(t) at the time t. Then by 


definition — 
AOD) _ E or | exo] ; 
At AL 
first difference. Hence if the 


e t of the square-bracketed 
as At 0, say Y (iQ, t, X), 


where A is the usual symbol for a 
conditional average for given X at tim 
quantity has a limit (almost always) 


then 
wi B {tligt X) 9). (1) 
i = 
e may commute the expectation and 


In operational form, if w 
differentiation symbols, 


a 
20) v (ig, 55) 099 (2) 
where the operator @/di¢ acts only on Cg). 


ld if the limiting functions exist even if 
but the function Y may not then 
t history as it must for a Markov 
be used to obtain a solution for 


This equation will ho 
the process is not Markovian, 
be independent of further pas 
process, and so (2) can no longer 
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given initial conditions. Equation (2) includes the partial 
differential equation obtained for multiplicative chains, and in 
fact includes a wider class of partial differential equations 
available for Markov chains of the population type, where X(t) 
is a positive integer V(t). In terms of the p.g.f. II,(z), for such 
chains it becomes 
md bi (tog z,t, z5) II (z). (3) 


The use of an equation of this last type has been introduced 
independently by more than one writer, but appears first to have 
been made by Palm (1943). The extension of equations (1), (2) 
and (3) to cover explicitly more than one variable is evident. 
It will be found that the appropriate equation for many Markov 
chains can be written down at once in terms of the possible 
transitions. For example, if we return to the simplified electron- 
photon cascade model discussed in $3:4 (equation (18)), we have 
the following schematic picture of the possible transitions: 


Type of transition Rate Operator 
w—> 22 Ay d/ew i 
wl fy 9[0w 

z— zw A, 9/0z 
21 A ðļôz 


This gives rise to the equation 


oll, 
E m Duet) a(t — wy) Test) 


&Duteo-$ eg agen. 


If, as in immigration, the chance of transition were independent 
of the number of individuals, the transition would be of the type 
1—z, and correspondingly, the operator would be 1. If it were 
proportional to the product of the numbers of two types of 
individual, as in molecular association in problems of statistical 
mechanics or in epidemiological problems where the chance of 
infection depends on contact between infected and uninfected 
persons (see § 4-4), the transition would be of the type zw—>..., 
and, correspondingly, the operator 0?/dzdw would appear (in 


T -—— ——  — 
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the epidemiological application if z relates to uninfected persons 
and w to infected persons, the transition is zw--w*). Unfor- 
tunately higher-order terms in the variables z,w,... or in the 
operators are liable to lead to intractability in the solution, even 
if the equation itself is simple to write down. Sometimes when 
the equilibrium or limiting distribution is of interest, as in 
statistical-mechanical applications, this is obtainable, even when 
the complete distributional solution at any time t is not (see, for 
example, the discussion of the equilibrium distributions in 
quantum kinetics by Moyal (1949)). 

Returning to the general equation ( 
E, x{[ei#4X— 1]/At] exists for given 
also the same limit 


lim log E x{ei#4*}/At= V(ig,t, X). 
At 


1), we note that if the limit 
X as At 0, then we have 


This indicates, as noted in $3-1, that Y has the local structure 
of an additive process, and contains two possible components, 
a normal or diffusion term, and a discontinuous or transition 
term (see.also Moyal (M)). The Markov chains we have been 
discussing obviously belong to the latter type. For pure diffusion 
processes containing merely the first component, we have 


Wid, t, X) =m(t, X)id—4o°(t, X)¢%, 
whence (1) becomes 
OP p s (pmit, 046 67 geta). 


obtain the corresponding diffusion 


Inverting this equation to E 
we have in general 


equation for the probability density, 
10 

ate) = E Ufo) m(t, 21+ 5 aga [flx) o°, 2). (4) 

rmalism for the backward 

tial condition X =%p at to 

|o; to)» and may write 


Consider now the corresponding for 
equation. We must now insert the int 
explicitly in the distribution function F(x 


F(x | £o ĉo) - [se | zo Ato» to + Ato) dG (Ato | to» to)» 


86 PROCESSES IN CONTINUOUS TIME 3:5 
or in terms of C(4), 


CL | xo, to) - [ae | zo Axo, ty + Ato) dG(Azy | to; to)» 


where the integral is for the conditional variable Az) with 
distribution function G. This leads to the differential equation 


EU Sen Jm. fics | o+ Aat, ty + At) — CCÉ | az, ty + Afy)]/Aty 
0 nd 


x dG(Axo | toto), (5) 


considered for variable z}. By taking some appropriate trans- 
form, with respect to Xo, of F(x) (or C()), an equation resembling 
(2) (or its transform with respect to x) may be obtained. How- 
ever, some precautions on the admissib 
tion of x and z,-- Az, are necessary, 
often more useful as it stands. For e 


represents a pure diffusion distributio 
analogue of (4) 


le joint range of integra- 
and equation (5) seems 
xample, if G(Ax, | zy, to) 
n, we easily obtain the 


fio | o, ty) Of (x | xs, t, 22 NE 
EN D T us o) (6) 


Further, in the important case of multiplicative chains, Cil | £o) 
can always be expressed as [Od | DE», enabling an equation 
for variable z, to be converted to one for O(¢ | 1). Even without 
this simplification equation (5) is valuable for studying the 
admissible soluti s. We shall examine it 


fuse |x) 2 A(z). 


Then (5) is equivalent to the integr. 
written down from the renewal o 
and z 4 y, 


al equation, alternatively 
r regenerative property of x 


t fo 
Cd | x) -[ fae [z+ y) 9*9 dudg(y | x) 4 etd gma, (7) 


—! — 
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In this equation one transition has occurred in the first right- 
hand term at time t— u, and if we define Cf? as the solution com- 
pounded of zero, one, ..., n transitions only (in time t), we have 
the iterative solution of (7) 


C9 | x) 
o 
-f [ Ci»X9 | z4-y) e-t-ho dudg(y | x) + ere, — (8) 
0) —o 
We could, alternatively, have written down equivalent equations 
for F(z|x) and F(x |x), where F(\(q,| x) is necessarily in- 
creasing with » and bounded above by unity, and hence has a 
limit F(x,|x). If F(co|z) — 1, this limit is a valid distribution 
representing the solution to our equation. If F(co|z) « 1, it 
implies that part of the total probability has disappeared, and 
F(x, | x) would only represent the solution in a somewhat wider 
sense, We shall not consider such cases further here (for a 
detailed discussion see Moyal, 1957). 
Taking the Laplace transform of (7) gives us, say, 


| > p oy) daly |x) +6 
= (9) 


Lyó|o yer hz) 
If (co |z) - 1, O(0|2)=1 and Z,(0|2) — 1/V. This enables the 
conditions for a valid solution (in the sense just defined) to be 


examined. Thus for a homogeneous *birth-and-death process 
extending the ‘birth’ process of $3:2 to include transitions up 


or down by one, we have 
dg(y | 2) 2 [Aey — D uou + 1)] dy. 


ó-function. Thus (9) becomes 


where ó(y) is the Dirac 
A,Lld | e D Heby le- DHE " 
Lyga) = = ALS (10) 


For ¢ — 0, y> 0, this gives, for the quantity 
Jy(x) = 1— pL,(0 |x) >, 
mE 1) uo), 1) (11) 
* Agta V 
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This recurrence relation in x should be satisfied only by J;(z). 0. 
For convenience we assume A,+0 (r—0,1,...), for otherwise 
it is obvious that J, (x) = 0 for all initial values not greater than 
the maximum z for which A,—0, and we may re-number the 
states upwards from this z as zero (in the frequent case where 
Ag=0 this would imply re-numbering from r= 1). In the 
particular case “,=0 for all x, (11) gives 


40 =Iylo) [TE +y; 


J,,(n) is from its definition necessarily bounded for all n, so that 
J,(x) 2 0 if and only if the infinite product diverges, i.e. if and 
only if Z1/A; diverges, the condition quoted in §3-2. In the case 
A, and y, both non-zero, D. G. Kendall has shown (unpublished; 
cf. also Dobru&in, 1952) that (11) permits no solution other than 
J,(%) = 0 if and only if ZW, diverges, where 


rol, i Pita sse pt 

"CM orent ren aa 
For example, for the multiplicative birth-and-death process for 
which A, and 5, are proportional to n, ZI1/A,; and hence also 
ZW, diverge, and thus a valid solution for which F(oo) =1 exists 
for all finite t. It may be noticed from the solution of the simple 
multiplicative birth-and-death process with A, =n, bn = Yb 
given in § 3-4 that (co) may not remain unity as t-> oo; thus, for 
A>p, F(co)=T1,,(1) — (u[AY* for x individuals at t= 
that the probability not absorbed at zero all moves 
to infinity, so that the process is still dissipative. 


0, showing 
eventually 


ts 
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Chapter 4 
MISCELLANEOUS STATISTICAL APPLICATIONS 


41 Some applications of the random walk or additive 
process 

In this chapter we select some more problems in mathematical 
statistics (excluding applications in physics), both for their 
intrinsic interest and in order to demonstrate further the 
application of the methods so far developed. The topics dis- 
cussed, especially in this first section, may appear somewhat 
miscellaneous, but are nevertheless linked by means of the 
appropriate stochastic process technique. Thus as further 
examples of ‘random walks’ with absorbing boundaries we discuss 
in this section (i) a problem of collective insurance risk, (ii) the 
sampling theory of distribution functions, and (iii) sequential 
analysis. 

A problem in insurance risk. C 
& firm with initial capital b, increasing premiums / per unit 
time, and claims occurring at random at an average rate v. The 
amount of a claim follows the distribution with m.g.f. M(6). 
It is required to know the chance P, of the firm going bankrupt. 

This problem is merely a generalization of the gambler’s ruin 
problem. Although here we have a problem in continuous time, 
it has been noted that Wald’s identity (§ 2:1) is still applicable. 
We take the other boundary 4—-— 9? (measuring for con- 
venience claims positively, bankruptcy corresponding to X =b). 
We assume vM'(0) — 4 « 0 (otherwise P= 1), and the relation for 


Bi 
> is then P,E, (eX (799 +P, EB, {eX(P%} —1, (1) 


onsider the insurance risk of 


where 0, (> 0) is the root of the equation 
exp (/(M(0) — 1)-46)-1 (2) 


the extra term exp (— 0) arising from 
) note that for X(T) in the 


we obtain 


(cf. formula (7) of 83-1, 
the negative constant drift x). In (1 
first term, X(T) «a, and as a —99 


B, - i| E (X^ « ets, (3) 
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In the asymptotic case when b is large compared with individual 
claims, we have the approximate equality P, ~ e-%%. 

This result has been given by various writers using more direct 
methods (see, for example, Segerdahl, 1939), and it is of interest 
to demonstrate the equivalence of (3) to the usual solution. 

The amount of capital at time t is given by 

Z(t)-b- X()-b4-ut— Y(t), 


where the total amount Y(t) of all claims has m.g.f. given by 
exp (vtM (8) — 1)). Let the probability of bankruptcy atsome time 
subsequent to a stage at which the capital is Z be P(Z). Then 
from the situation at time t+ At we deduce that 


P(Z) - (1 —»At) P(Z + nM) + vAt M — u) dF(u) 
0 
T vAt[1— F(Z)]+0(At), 


where F(z) is the distribution function corresponding to M(4). 
This gives as At+0 (it is assumed that the derivative P'(Z) 


exists) Z 
-uP'(Z)+vP(Z)=1[1— F(Z)] -v , Pw), 
or, for Z=b at t=0,a probability P(b) satisfying the equation 
-E P'O) + P()- 1— F() 4 | (bu) dF(u). 
Integrating the last term by parts, we obtain 


-E P'(b) + P(b)=1—F(b)— [Po-wu Foo 


b 0 
-f P-n) [1— F(u)]du, 
o — EP- i "P'b—u) [1—F(u)] du 
0 
b 
--n-r0)«a | PU— 0 - Padu. 


Integrating with respect to b from b to co, we have finally an 
integral equation of Volterra type for P(b), 


e gi by 
P(b) =f P — F(u)]du +f zre- u) [1 — F(u)]du. 
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Methods of solution of an integral equation of this type were 
indicated in § 2-11. The asymptotic solution for large b is given 
by P(b)~Ce-%, where 0, (= — V) is the positive root of the 
equation 


NT - = 
|n Ta [1 —F(u)]du=1, 


or, on integration by parts, of 
Le L fL earn 
P [1—F(u)] E + o ge (u) 


i.e. of v(M(0) — 1) — 40 — 0, 


which is equivalent to equation (2). To identify the two asymp- 
totic solutions we have to show also that C~1. Quoting again 
from the solution in § 2-11 we have under the conditions given 


there " 
0y( — 0o 
C=a(-9 / > 

(799 | 83, 

where ee 6) =|, epp Yn — Fwy] dv | du, 

0 ul 

which on successive integration by parts reduces to 

v yt—8) 
a(-0)- —5 vdF()*t—5-* 


Y(— 0) being given by v[M(8) — 1]/(/9). Hence 


l1 vp 
a 00) = 9, ~ io 


where ji, is the mean of the F(u) distribution; and 


ay(-0) | "LM(09 =) vM' (09) 
90 — pO Uo 
1 , vM'(0o) 
=— d, dt 79 . 
pii. (4) 


Thus C= VW 89-5 
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To proceed further, we assume conditions under which as b 
increases the probability P(b) is still of some interest as a non- 
vanishing quantity, so that 0, must become small. Hence we 
expand M(4,) in powers of 6, and equation (2) becomes, say, 


V(14 0, + $4263 + 0(63)) — n6, — 0, 


4 — Ia V t o(0,) 
d, Ec Ebo (5) 
E a dusv 
Hence »M'(0)) — u vu, -pu + HtaVOS 4- o(0,) 
—A — Vit, + 0(0,). 


In (5) the assumption that 9, is small is satisfied for finite Jg V 
when the mean rate of increase in capital, x — i v, is small. We 
have then x — vp, = O(0,) and equation (4) reduces to C — 1 -Fo(1). 

It will be noticed that the first method gives the simpler solu- 
tion (3), which as an inequality holds for all values of b; the second 
method may be compared with the integral equation methods of 
§§ 2-1 and 2-11, providing an exact solution if required. 

The sampling theory of distribution functions. If a sample of 
n independent observations Z all following the same distribution 
function F(z) is used to construct the empirical distribution 
function F,(z)= n,/n, where n, is the number of Observations 
whose values are less than or equal to z, it is known that 
TF, (z) > F(z), as n>, in the sense that} 


P(lim max [| F(z) — F(z) |]=0}=1. 


Amore useful result from a practical stand 
asymptotic expression for 


F, = P(max [| F,(z) — F(z) |] « AJ Jn) 


point is Kolmogorov’s 


as n becomes large, in the case of F(z) continuous. It is readily 
seen that the value of P, is invariant to the shape of the increasing 
continuous function F(z), for we may transform by a (1,1) 
correspondence to the new random variable Y = F(Z), which 
has a rectangular distribution G(y) in the interval (0,1); thus, 
since G,(y) s F,(z), P, need only be evaluated for a rectangular 
distribution. 


T From now on we use |x| for ‘absolute value of 


=’, and not ||z|| as in 
Chapters 2 and 3 (from $ 2-2). 
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Nowifnindependent observations Y are distributed atrandom 
in the interval (0, 1), we cannot immediately apply the theory of 
the random walk with independent increments to the random 
function N,/n, since N,/n=n/n=1 and the increments are 
obviously not independent. Consider, however, the joint dis- 
tribution of n + 1 independent values W; from the same distribu- 
tional law H(w)=1—e-*” (w> 0). These values have frequency 


aid ntl bu 


a exp | —adw;} II du; 
i isl 


i=l 
The distribution of U ==, W; is well known to have frequency law 
oo thane72 du[n!, 


whence the distribution of W, ..., Wa, given U — uy, is 


j 5 
n! T] dwlu? (Ew fr all j<n), 
ici i=1 
or for uw)=1, 
j ; 
n! JI dw; (£o 1 tor aitj<n). 
isl i=1 


Transforming to the cumulative variables 
j : 
U= PU (jo Lon), 
i= 


we obtain the equivalent frequency law 


n! Il du; (uj &uj4 X1 for all j « n), 
jel 


which is the distribution of an ordered set of n independent 
terval (0,1). We may 


variables uniformly distributed in the in : 
thus interpret the random quantity X(j)= U, — j[n as a random 
walk conditional on X(n-- 1)= — 1/n — o(1] 4n) 


]ts of 83:1 (equations ( 
iude d ar : he mean 1/a of W to be 1/n. 


arrange that E(X) —0 by choosing vole its nj 1/8. "Hn 


The increase in variance per unit i 
standard deviation of fluctuations in X being of order 1/./n, we 


take b to be of order 1/47 (b is thus large compared with 1/n, 
the order of magnitude of individual increments of X). Putting 


13) and (14)) we 
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t — 1 and b — A[ 4n in equations (13) and (14) of $3-1, we have the 
asymptotic probability of the deviation U; —j[n not exceeding 
A[An, either in both directions or in one. 

As the theoretical function G(y) = y is at 45° to the y-axis, the 
boundaries a horizontal distance + A/Jn from this function are 
also the same vertical distance; moreover, the ‘corners’ of 
the function G,(y) are the points (U,, 0), (U,, 1/n), (Up, 1/m), 
(Ua, 2/n), ..., which include the points defined by X(j) and also 
corresponding points a vertical distance 1 |n below them. As 
l[n«A[|4n, all these points are included within the 4 Alin 
boundary with the same asymptotic probability, which is thus 


ao 
X (= 1) e-2A*s?, (6) 
s=-0 
If we require the probability of not exceeding the deviation 
A/,/n on one side only, we have similarly the result 


1—e-2?, (7) 


Sequential analysis. It is not the intention to discuss at length 
here the sampling technique in statistics known as sequential 
analysis (see Wald, 1947), which is a method of statistical sam- 
pling inspection in cases where the units sampled may con- 


veniently be examined serially. Its main principles, and close 
relation to random-walk th 


j eory in the case of independent sample 
observations, can, however, be indicated, We consider the 
probability (or probability density in the case of a continuous 
distribution) 


n 
Po=P{S | a} = IL»G;] 9) (8) 
em 
of obtaining the independent sample observations 
Sz (x1, £o, wey Wy) 


on a hypothesis c, and similarly for an alternative hypothesis 
%, and set up a rule: when p/p, « A, reject 2 in favour of e; 
when po[p, > B, reject o. 

Now as n increases, X(n)—logp,— log p, L- L,, say, is 
from (8) for a, (or o) true a cumulative random-walk sequence, 
with boundaries a —log A, b=log B; hence P{N =co}, where N 
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is the value of n for which X(n) reaches one or other boundary, 
is zero. Let us stipulate that the chance of error in rejecting a 
when in facta istrueis ey, and that the chance of error in rejecting 
a, when in fact œ is true is €. As the inequality Po SAP; is 
always satisfied when o is accepted, the over-all probability 
for all samples for which œ is accepted when a is true is at most 
A times the probability when o is true. But the over-all pro- 
bability for such samples is to be €, if a is true, and 1— e, if a, 


is true. Hence 
ey & A(1 — 6). 


Similarly 1—652 Be, 


and we must have 
Eo l— €o 
A> , RS 
1—6 


(9) 


hen the excess over the boundaries may 
s become equalities. 

f size of sample required for this 
fromour earlier random- 
of sample required 


In the asymptotic case w. 
be neglected, these inequalitie 
The distributional theory o 
sampling technique follows immediately 
walk theory. For example, the average size 
is from equation (24) of § 2-1 
E(N) = (P,a + BU) m, (10) 
where 
a —log A ~ log (eo/ (1 —€,)}, b=log B~ log {(1 — €o)/61} 
(a | e3))- 


e when o, and a, are ‘close’ enough to 


require a large sample for discrimination is worth examination. 
For definiteness suppose % and 2 denote alternative values of 
an actual parameter a of the distribution of an individual obser- 


vation, where o — & is small. Then 


and m= E(log p(x | %o)— 1082 


The asymptotic cas 


Lig— Ly = (00—01) Lo — 0 5 L'A), (11) 
artial differentiation with respect to a, 


LI 
where dashes denote p wit 
: nterval (2,24). Similarly 


and //, is some value in the i 


Is Ls = (29 — 9) Li+ Moo - 0* LU). (12) 
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Equation (11) is relevant when a=a, and (12) when a= Oy, In 
either case L,— L, is a cumulative sum of independent variables 
for each of which 


mo (cq — a) E{Lg(x,)} — 3 (og — 04)? E{LG(ax,)} 


for %—a,~0 (for L” uniformly continuous in æ in the neigh- 
bourhood of a, o) or 


mel(xy—oe Is (e-—2), 
as E{Lo(x,|xo)}=0, E([Lix, | «91 = — E(Le(z, | 4)) — Io, 


Say, under not very restrictive conditions on L (these results 
are well known in the theory of statistical estimation, J, o being 
the information function introduced by R. A. Fisher; see, for 
example, Chapter 8). Further, 


O? ~ (8 — 5)? B{[Lo(«, | 4)]?) = (a4 — a) Io («—o). 
We thus have the normal asymptotic case of § 2-1 with 
2m-s*-(my—am)*Is (a=a). (13) 
Similarly from equation (12) we find 
-2m-g?-(my—o*I, (@=0). (14) 
Equation (19) of § 2-1 thus becomes 


l—e- 1-e 
<= (a — 2); eg (*-2). (18) 
If, therefore, we require P, —1—&, (a= 94), € (=q), we may 
solve equations (15) with these values inserted for P, and obtain 
1—e, E 
UNE 0 t," 0 
e= a~“ hes &' (16) 


in agreement; with the general formula (9). The earlier theory 
for the normal case also provides at once the distributional 
theory for sample size in this general asymptotic case. 


4:2 Simple renewal as a Markov process 

The continual renewal or replacement of a single article which 
wears out in time with a frequency distribution f(x)dx was 
discussed in Chapter 2 as an example of a random walk with 


—Ó — à 
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increments following such a distribution, but we consider it 
again here from the standpoint of the theory of Markov processes. 
It is evident that the behaviour of the process at time t depends 
on more than a knowledge of whether a renewal occurred at 
chosen time ty, for whether an article wears out 
depends on its age, i.e. on the time from the previous random 
renewal time T,. A specification of states in terms of the age at 
any time t is thus à Markov process specification, in fact, the 
process is an example of the multiplicative chains considered in 
§ 3:42 in the very particular case of no ‘multiplication’. 

In terms of the formalism of that section, we must allow for 
the ageing in time, so that in the interval dt we have the transition 


z(x) > 2(a + dt) {1 — uo) dt} +2(0) p(x) dt, 


some previous 


whence — gto =a) Oat q) 
= andetu) a) 
z(u) | x 

Tet). (x (Aen O as 0 


and (2) now depends on the age, 
and is assumed finite for alla. Its connection with the ‘generation 
distribution’ f(x) da is seen by writing (2) equivalently in its 
renewal or integral equation form (cf. $3:5) 


M(x) =f II, (0) y(u +2) EXP | - [iut +2) do) du 
+2(a+t) exp | - [inen du) , (3) 


The ‘death-rate’ (x) in (1) 


" (4) 
giving f(a) 2 n(z) exp 4f p) 2 
where it is assumed that the integral in the curly bracket is 
divergent as z.—-oo, so that 


ie f(x) dz — 1. 


or alternatively (3), can be solved in terms of the 
and provides, for example, both the 
the age distribution (put z(u) — ei?) 


Equation (2), 
Laplace transform of II, 
characteristic function of 
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and the expected renewal density (put z(u)=1 except in the age 
interval (0, dt); note that the initial article is also included in the 
renewals at t=0 if z—0). However, such a ‘portmanteau’ 
solution is hardly required in this example, for the age distribu- 
tion can alternatively and more directly be obtained from the 
renewal density, an article of age u being a survivor from a 
renewal at a time w previously. 

Thus let the age-distribution function be H(u|z,t) for an 
initial article age x at t=0. Then clearly 


H(u | x,t) -[7 exp (= fro do) r(t—w, x) dw 


ob ral m 


where r(t, x) is the renewal density, the last term on the right is 
zero if u « z -- t, and the further increment to the first term on the 


right is zero if wt. As t increases, this gives a limiting age- 
distribution function independent of x 


Ho =r] "exp [- f" uoa) dw, (6) 


where * is the limiting constant renewal density (assumed here 
to exist; for weak conditions ensuring this, see Smith (1954)). 
Noting from (4) that 


1- Fu)-1- [ 1O w=exp|- ad, 


we have from (6) 


H(o))21-2r “a Fo] dur [^ yu) du 
0 0 


(in agreement with equation (2) of § 3-31). For simplicity we have 
assumed above that the density function f(u) exists, but the 
modifications necessary if this is not so are not difficult. 


4-21 Queues. A renewal problem of a rather different kind 
arisesin the servicing or ‘renewal’ of machines when the machines 
have to await the attention of an operator before being again 
ready for use. This problem then becomes one of a number 
classifiable under the generic title of ‘queue’ problems, Another 
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familiar example is the one of customers waiting to be 
including such situations as aircraft waiting to land at a single 
air-station. 

A number of useful results can be deduced by an elementary 
application of the renewal type of argument. Thus suppose we 
have an ‘input’ feeding a queue of customers waiting to be 


served at one ‘counter’, the queue size at time t being N (t). This 
erved. Consider an instant 


includes the customer, if any, being s 
T when a customer is just leaving, so that 

N(r-4-dt) - N(r)— 1=Q, 

-time for the next person be V, and suppose 
g this time V. Then the size of 
given by 


served, 


say. Let the service 
R new customers arrive durin 
queue when the next person leaves at time 7' is 


Q'2Q-14 R4 9(Q), (1) 
3(Q)=0 (Q0, 8(Q)=1 (Q=0). 


o high, no stationary process 
ume stationarity with EQ), 


where 


If the average rate of input is to 
can exist. We shall, however, ass 
E(Q?) finite. Averaging equation (1), we obtain under these 
assumptions E()-1 —E(R)-1-p. (2) 
mber of arrivals during themean 
intensity’.t Also by squaring 
that 62=06, 6Q=0, we find 


say, where p ( < 1) is the mean nu 
service time E(V), or the *traffic 
(1) before averaging and remembering 

2E(Qu - B - £( - Ry) + E((GR — 1 


We shall now assume further a constant (stochastic) input 
rate, such that R is a Poisson variable independent of Q with 
mean AV, say. The last equation then gives 


E(Q)=p + HEP) - pila -P 


Now E(R)- E,QV + AR?) 
=p 4 ph oF, 
as H(V)= =p/A. Hence 
(V) =a, say =P! i "t " 
E(Q)=P 2(1—p) H 
see Lindley (1952). 


f For a proof that p «1 is the stationarity condition, 
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Consider next the waiting-time W for a customer. The customer 
leaves after time W -- V, and Q customers will form behind him 
in this time at an average rate A(W + V). Hence 


AE(W + V) =p + 3[p? +A?0}]/(1—p), 

ath 

a  9(—p) aij 

This formula shows that for maximum efficiency the variance 

o} should be zero, and that the waiting-time will be doubled on 

the average if the service-time has the exponential distribution 

characteristic of random intervals. 

Similar arguments may be used to obtain the complete fre- 

quency distribution for the waiting-time W, which it will be 
noticed will have a non-zero probability of being zero. Let 


T(z) -E(z9) = E(gQ-1e9 n). 


and 


(4) 


from (1); or 
T(z) 2 (1— p) B,_4{2"} + pEs_o(29-*®}, 
as from (2) 1—p is the chance that ô= 1 or Q=0. Now 
Bet} = Ey(e ve) = H(z), 


say, as R is a Poisson variable with mean AV for given V, and 
further H(z) = £(A(1 —2)), where 


PU) = Bye) [ornant 


ioi 


is the Laplace transform of the service-time distribution Biv). 
5 

From» I1(2)=(1—p) Hl) + pH (2) Erol- 

But also II(z) 7 (1 — p) +pEy_o{z%. 

From these two equations 


0-5)0-2  Q-p)-2) 
D)- q-3Hg) "i-sBHAQ-zy (8) 


For the waiting-time distribution function C(w) define 


NH= Be?) = | eredt), 


From the stochastic relation between Q and W + V, we have 
Eq (29) — E, ween, 
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whence no A 
or Y= 0-A — BWANA, e 


a result due to Pollaczek. As a particular case, if the service- 
time is a purely random interval, so that its distribution is 
exponential, the continuous component of the waiting-time 
distribution has density 

pve” (v=Alp—A); (8) 


this case was considered by Erlang with particular reference to 
telephone ‘traffic’. For this case we have also 


1 E =p 
H(2)= 751 -a Ne= (9) 


and the size of queue left by a departing customer follows the 


geometric law (1— p) pt (q=0, 1,2, ...). 


= v 
Im a 


Size of queue 
w 
: 


0 iw 20 30 40 so oo #0 80 90 100 

Time (min.) 
a single server, 8 
ke and serving-time, 


starting with one customer 


Fig. 5. Realization of a queue with idige. 


at time zero (random inta! 
fan artificial queue realization for the 
(mean interval 2min.) and random 


Serving time (mean interval 1min., 80 dn op Ta ne 
Aquis utto ne pim cm pi p^ ld eom 
with theory appears fairly reasonable; it s cus cn uc 
however, that a complete test of agreement wou epe 


In table 2 the results o 
case of random intake 
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principles developed in Chapter 8 (see especially §8-21). The 
actual mean obtained for the arrival intervals was 1-96, but that 
for service-times came out by chance at the somewhat low figure 
of 0-82, which probably accounts for the rather low mean 
waiting-time of 0-78 for the customer. 


Table 2. Realization of a queue with random 


intake and serving time (p= Y) 


Queue Frequencies reli Frequencies 
r aiting- | 
sizo (after Ob- Theo- Ed Tis. d Theo- K 
service) served retical tomer n retical 
0 54 50-0 0 54 46 50 
1 19 25.0 Non-zero: 
2 13 12-5 0- 1 20 23 197 
3 9 6:3 1- 2 9 10 11:9 
4 5 31 2- 3 8 7 T2 
5 or more 0 3-1 3- 4 6 7 44 
4— 5 2 4 2:7 
Total 100 100 5- 6 0 1 1:6 
Mean 0:92 1:00 6-7 1 0 1-0 
7- 8 — 0 0-6 
8- 9 — i 0:4 
9-10 — 0 0:2 
10-11 — 0 0:1 
11-12 — 0 0:1 
12 or mor, — — 1 0:1 
Total 100 100 100 
Mean | 0-78 1315 1:00 


A more general method of investigating the waiting-time 
distribution (covering more general arrival-time distributions) 
is as follows. Let us label the customers so that the first waits 
time W,, the second arrives at a time T later and waits W,. Then 
if the serving-time for the first customer is V, 


W-V-WeT 


for W,--Vz T, orif V=U+T, 


2 


2 


- [m+ U (W,+U20), 


(W,+U <0). 
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It is convenient to introduce a further variable Z which equals 
W for W > 0 and equals minus the time the server was previously 
unemployed if W —0. Then 

[A U (Z,>0), 


"is (Z, <0). 


Let glu) be the frequency function of U and h(z) that of Z (this 
in general will exist, though that for W will not owing to the 
non-zero probability that W=0). Then these functions must 


satisfy the integral equation 
halze) — I hy) g(23— 2) dz + Bg), 


where P, is the chance that Z,«0. In the stationary case this is 
r-Hopf type, 


equivalent to the integral equation, of Wiene 


a)» [inge - ay Pr). (10) 


0 
where p- [^ Mo. 


This equation, due to Lindley, determines in principle A(z) from 
the distribution of U=V—Z, though it is rather awkward to 
handle for more general arrival-time distributions. Its solution 
has been discussed in this context by Smith (1953), who has 
shown in particular that if the service-time distribution is 
exponential so is that of the (non-zero) waiting-time, whatever 
the arrival-time distribution; this remarkable result was later 
shown by D. G. Kendall (1953) to apply even in the case of more 
than one server. Kendall has also pointed out that the preceding 
methods make use of the same general principle of extracting 
some convenient Markov sequence from the complete stochastic 
process, which is not Markovian in the general case unless extra 
variables are introduced (akin to the age in the simple renewal 


process of § 4-2). 
More than one serve 
we shall return here to th 
input, e.g. telephone ‘traff 
if we assume further a rando 


r. For the case of more than one ‘server’, 
e simplifying assumption of random 
c? with a number s of lines. As $00, 
m time-interval for occupying a line, 
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this problem is stochastically identical with that of equilibrium 
under a death-rate y and an immigration rate A. The equi- 
librium distribution for the number of occupied lines is thus 

2,7 € [uy nt. (11) 
It should be noticed that this distribution can be obtained at 
once from the equation for p,(t) 


Op, (t)[0t— — (A+ pm) Palt) + Ap, (0) + (0+ 1) HP pall), 
for in equilibrium 


Posi [0 pm) ps — Ap, MU 3) 4], 
with p, — Apo[p. 

However, if the number s of lines is finite, the number n of 
customers either making use of these lines or waiting to do so is 
governed by the modified equilibrium equation for n > s 

0— — (À + us) p - Ap, a - SI 
where Pa [py poln! (n5), (12) 
but P.— Mp)" pol[s!s"7*] (n> s). (13) 
The number of waiting customers is n—s for n>s. The sum 
Zp, [pg converges if A/u<s. 

Finally, we shall consider a servieing problem where there 
are m operators available for repairing any of ms machines. We 
shall again consider only the simple case of random breakdowns 
(rate A per machine) and random service-times (completed at 
rate 4). It is evident that a single group of m operators to ms 
machines is in principle more efficient than m separate groups 
of 1 operator to s machines, because the possibility of an operator 
being idle while a machine in another group requires attention 
is excluded if all operators are pooled. To investigate the gain in 
detail, we set up the equilibrium equations for the number of 

machines (not operators) idle: 


O0 msÀp, — HPI, 
[(ms — n) A+ nj] p, — (ms —n + 1) Ap, (n 1) ups, 
(l<n<m), } (14) 
[(ms n) A+ Mp] p, = (ms =n + 1) Ap, MUP ps 
(m<n<ms), 
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whence for 0€ n «m 


(n-- 1) ipsa = (nsn) Pps 
while for n>m, = i 
mppy a= (ms — N) Ap. 
For m=1, this gives 


In the particular case when s becomes large, but the expected 
rate of breakdown remains finite, i.e. sÀ -> S, say, the distribution 
(15) becomes 

2,7 (SIU) Po» 


and the chance of no machine out of use is 1 — S/p. This checks 
of course with the queue problem with the machines taking the 
role of customers. The expected number of machines in the 
waiting-line (excluding any machine being serviced) comes out 
to be p?/(1— p), where p — S[p. 

In the case of general m, we shall consider the equations (14) 
in the same limiting situation of s large but sà >S. With p=S/#, 
the equations become 


PMPo= Pr» 
(pm +2) p, =PMPn-1+ (M+ 1) Pav 
(P+1) pu =PPn—-1 t Pasa 


(1&n«m), (16) 


(n2 m). 


The last equation gives 


Du D" "Pm (n>m), (17) 


while for 1«n <m, 

Dn = (pm[n) Par (18) 
These relations determine the form of the distribution for n 2 m 
and «m. For example, for m= 2, 


Pic DpPe P2=2P"Por Pa=P™“Pa=?P"Po» 


whence poll + 20/(1 —0 7 1, 
or po=(1—p)/(1+P)- 


number of machines in the waiting-line under 


The expected | 3 
?) and the ratio to twice the number 


these conditions is 20?/(1 — 
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for m=1 is p/(1+p), indicating the gain in efficiency already 
referred to. As m further increases, the ratio of the expected 
number of waiting machines to the total expected number for 
the m= 1 individual groups tends eventually to zero. 


4:3 Population growth as a multiplicative process 


In § 3-4 we considered a simple multiplicative birth-and-death 
process with constant birth- and death-rates A and x independent 
of time. This was extended to functions A(t) and u(t) depending 
on the time, but for human and most animal populations it is 
essential to consider the variation of A and x with the age of 
the individual concerned, just as the energy of a particle of a 
cascade shower had to be introduced in § 3-42. 

Before formulating such a process, we note briefly the usual 
treatment, which refers merely to expected numbers. To simplify 
the problem, we consider the standard case where the process is 
temporally homogeneous, and where the females only are 
enumerated. The expected female birth-rate r(t) then satisfies 
an integral equation of renewal type, viz. 

t 

r(t)=ro(t) + [ism r(t—T)dr, (1) 
where s(t) is the expected rate of female offspring at time t of 
a female born at time 0, and 7,(t) depends on the value of r(t) 
prior to t=0. The function s(t) is the product of the rate of 
offspring at time t to a female aged t and the chance of survival 
to time t, 1 — F(t). This equation may be solved either iteratively 
in terms of successive generations, or by an expansion of the type 
referred to in the theoretical solution quoted in § 2-11, 

The linearity of equation (1) ensures its validity as an expecta- 
tion equation under fairly general conditions, though it neglects 
the effect of a changing sex ratio, only females (or males) being 
considered, and is liable to give misleading results for com- 
paratively new countries (like Australia) with a high immigration 
rate and anomalous sex ratio. In order to represent population 
growth as a stochastic process, it is necessary to give a more 
explicit and rather idealized specification of the statistical 
mechanism of growth. Before introducing the complication of 
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the continuous age-structure, we may indicate the stochastic 
aspect by reference to the extinction question, treating this 
quite simply by keeping to 'generations' as a strictly discrete 
‘time’ variable. The methods of $ 2-3 may then be used to show 
that, on the assumption of independent female genealogical 
lines, a particular line will become extinct if R, the net reproduc- 
tion rate defined as the mean of the female replacement dis- 
tribution per female, is not greater than 1. Extinction similarly 
follows for any finite number of lines. This result implies the 
ultimate extinction of the population in time if we note that while 
generations overlap in time, the spread of generations (which 
for each line is a ‘random walk’ or additive process) is O(t), 
and hence o(t), as t increases. 

When R= 1, the population mean size is ultimately stationary, 
but we see that this does not imply any more complete stochastic 
stationarity. Of course whether the divergence of the actual or 
expected size is practically important depends on the order of 
magnitude involved. The coefficient of variation, that is, the 
standard deviation divided by the mean, for a population with 
constant mean will be of order y/(t/79), where m, is the initial size 
(see § 2-3). For a human population of 50 million and ¢ (measured 
in generations of about 25 years) equal to 2 units or 50 years, 
(t/n) is dg and fluctuations are negligible; on the other hand, 
for an animal population of 100 with a 1-year life cycle, J(!/mo) 3 
in 10 years. When R > 1, the chance of extinction rapidly tends 
to zero with the increase in size (cf. the chance of extinction of 


favourable gene mut; 2-3), but may still 


ations referred to in $2 
be appreciable for small numbers and in particular for n=l, 
ie. the descendants of a single individual, as was illustrated 


in § 2-11. 

Exact treatment of age-structure. We return now to the detailed 
specification problem, and make the following simplifying 
assumptions about the mechanism of the process: o 

(a) the sub-populations generated by any two co-existing 


individuals develop independently of each other; 

(b) an individual of age « existing at time ! has a chance 
A(x) At+o(At) of producing one new individual of age zero 
during the interval (£, t+ At), independently of previous events 
(multiple births, the planning of family size, etc., are ignored); 
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(c) an individual has a chance (x) At + (At) of dying during 
the same interval. 

The formalism of § 3-42 (cf. also § 4-2) gives for the transitions 
in the interval dt 

2(x) > 2(a + dt) (1 — [A(2) + (x) dt} 

+2(0) z(x: + dt) A(x) dt + p(x) dt, 
whence the ‘backward’ equation for the characteristic functional 
of the number N(u | x,t) of age less than or equal to u at time t, 
given one individual aged x at (— 0, expressed for convenience 
in terms of 2(u)=expi6(u) as the probability-generating 
functional 
IL(z(u) | x) = E[exp t0(u) dN (u |x,t)), 

becomes 


IT i 
PAS) L X) O) Tte) — T 2)] un) [1 — Tu] E 


or in integral equation form 


t 
II,(x) -fi [u(x +u) +A(w+ t) TT, ut u) YE, ,(0)] g(a +u | £) du 

+q(a+t|a)z(a+t), (3) 
where q(z 4-t|z), representing the chance of no birth or death 
in the period t for an individual aged x at t=0, is given by 


Í z+t 
ate-+t|)=exp| - |" [A(u) + n(u)] du) . (4) 


The general solution of (2) or (3), at least in closed form, is 
unknown, though an iterative form of solution in terms of the 
contributions from successive generations is fairly readily 
obtained. In fact, if we write down in place of (3) the integral 
equation obtained by considering deaths alone as renewal or 
regenerative instants, we have 


TI,(z) = [itp eu | x) F(u; x) du 
+plct+t|x)2la+t) Wu), (5) 


rci 
where p(xt+t| )=exp{—[" (udu) (6) 
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is the chance of the initial individual not dying in the period t, 
and Y(w;zx) denotes the functional at time ¢ for the progeny 
born between 0 and u of an individual of initial age x who does 
not die. Now such an individual may be regarded as an 'im- 
migration source’ of progeny, by the principles of § 3-41, so that 


Vj a) =exp (A tri o) - 11a»). (7) 


The new integral equation obtained by substituting (7) in (5) 
has the advantage that the functional in (7) is composed of 
contributions from new births. Thus if we write II as the 
contribution to II of all generations up to the nth (the initial 
ancestor representing the zeroth generation), then on the right- 
hand side of (5) we must write II—, and the iterative form of 
solution of the first integral equation (3) is consequently 


IP) tpe t| a)exp f f A+ t) - Nau} 


t 
+ f lx+ u) p(a+u | v) exp TEG +v) [H0 (0) — 1] d) du. 
Jo 0 

(8) 
In the particular case of A(x), y(x) constant, the complete 
solution of (2) or (3) is known. We note from (3) that the general 

solution is linear in z(z +t), and we may write 

II(z) =1+ [2(v+t)-1] U +F, 
where U and V do not involve z(x +t), while V vanishes and U=1 
when t=0. We easily obtain, for A(x) =A, p(z) 4 the pair of 
equations for U, V 


WU UC A([o(t) -1]U 4 V)U, 


ot 
OF -uV +NN -U+ VV + Ds 
whence 
1 T) eo-1e, 


p 
Lg) 
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From the first of these equations, 


y t 
g= e| [z(u) — 1] Ae?" du, 
o 


and substituting this result in the second equation, integrating 
by parts and using the initial value of U, we obtain 


e-r 


t 
VU 1 -f [2(u) — 1] x ee [AeA-M 070 — jl du, 
Hence we find 


(A — x) (tee t)-1]et4- Af [z(u) — 1] e-teu du. 
IL(z)- 14 : 


t 
(A —4)-A ['tetu) - 1persputonto9 idu 
0 
(9) 
Once the result (9) is obtained, it gives the simplest method of 
obtaining the moment formulae. Thus for ages less than t, the 
initial ancestor does not contribute (cf. the age distribution in 


the renewal problem of$4-2). Putting z(z +t) = 1 in (9), otherwise 


xu) —e'*9, and taking logarithms, we find for the cumulant 
functional 


A A 
K,=1 —-——L|- = 
t ZIU xh) log ( wah): (10) 
t 
ini hf [EP — 1] [pet eu puer] du, 


t 
L -f [eiD — 1] [AeA -ot e-u — pe- du, 


Expanding the logarithms, and €/?9, we obtain any required 
moment density function (§ 3-42). It is convenient to define in 
relation to the cumulant functional (10) the cumulant densitiest 


E(dN (u,t)} =a(u, t)du=var {dN (u,t)}, 

11 

cov {dN (u, t), dN(v,t)}=p(u,v,t)dudv (u+v), un 

+ It is not perhaps immediately clear that the order of magnitude of the 

moments assumed in (11) is valid even when u—z-Ft is excluded, especially 

in view of the special role of 4 —0 in birth and renewal processes, but this can 
be checked if required, for example, from equation (5). 
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(note that (u,v) +a(w) «(v) is the product-density of order two 
defined in § 3-42). Then, apart from excluding for convenience the 
anomalous direct contribution from the initial ancestor, we 
have in general 


Kio [aust [id(u) — 462(w)] du . 
-3 | [e oot taa... (12) 


In the particular case (10), we find 
a(u,t) =A (O<u<t), (13) 


Blu, v, t) = aie eLA- iAuto) — a eA- [guam 4 —HU-AV] 
á (u<v<t), (14) 


the last formula holding also by symmetry for v <u. 
These formulae can alternatively be derived directly by the 


methods developed in $3:42 (cf. D. G. Kendall, 1949). The 
relevant moment equations will appear more familiar if derived 
from the ‘forward’ equation, which is (equation (11) of $3:42) 


where dIL(z(u)) = IL, (z*(u)) — Iu); 
z*(u) =2(w-+ dt) + A(u) dt[z(0) — 1] (w+ dt) + y(u) di[1 — z(u + dt)], 
or, in terms of : 
Kigu) elog E|exp| if dt |}. 
AK (id(u)) = K,(id*()) - KP), (15) 
where 
id*(u) =id(u + dt) + dt[A(u) (569 — 1) + p(u) (e-i 1)]. 


We shall not discuss these equations further here, except to 
check their consistency with the integral equation (1). We 
have from (15) 


dK id(u)) = { "alu, 1) [idu + dt) — id(u) + dt(Alu) id(0) 
; — uu) id(u))] dat. 


9a Ox 
S uon - 0), 
whence a u +pa=0 (u>0) - 


a(0, t) = [atate t) du. 
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The first equation gives 


a(u,t)=p(u)a(0,t—u) (0<u<d), 


where (cf. equation (6)) pe) -exp|- f uo) aj. The second 
0 


equation in (16) then gives an integral equation identical with 
(1) if we write 
7()2(0,), s(t) =p(t) A(t), 


and re(t) zi A(u) a(u, t) du — A(x +t) exp - [^ do) E 


The effect of immigration. The previous methods of allowing 
for immigration can be extended to cover the above specification. 
We shall consider only the age-independent case where A(u) =A, 
/(u)= 4, and also we assume an immigration (constant) rate v, 
the ages of immigrants being taken for simplicity all zero. As 
we shall be interested in the limiting population, we consider also 
an initial population of size zero. Then at time t thenewmoment- 
genera'ing functional L, will be given by 


log L,= 2 [eXuo — 1]du, 
0 


t =y] 
or = [2 + cal [eto — 1] [u — Ae-tn-9) t] e- nw a E . (17) 
AJo 
For >A, the limiting stationary distribution is given by 


Ap 2E vA 
L- T ilu) . —pu 
mal, [e lje-# du | . (18) 
Any required properties of the ultimate population can be 
obtained from (18). For example, if we write 

id(u) — 0, +O. f(x), 


we obtain the joint m.g.f. of the total number N of individuals 
and the total sum S of the function f(T) over all the random 
ages U in the population. We find 


= bar uu 
M(0,,0,) — = Pre | " (19) 
where ytp- | Pal jg - pu qu, (20) 


— 
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The marginal distribution of N is of course a negative binomial 
(as we have assumed the rates A, x, constant over the ages). 
Further, the conditional distribution of S/n for given N=n is 
given by the m.g.f. y"(6/n) (n> 0). This is exactly the same as 
if the » individuals were picked at random from an infinite 
population with the age distribution ue-"*dz, a result which 
would be anticipated as all the individuals enter the population 
at age zero (whether by births or immigration) and die off at 
rate 4. 


4:31 Growth and mutation in bacterial populations. In 
the case of bacterial population growth, the standard model is 
rather different from that discussed in the last section, for the 
bacterium usually splits into two new offspring, so that, if the 
bacteria are in favourable conditions under which the death-rate 
can be neglected, the transition is represented by 


z(a) > z(a + dt) (1 — A(x) dt) +2°(0) A(x) dt, 


whence 219 L A) [II(0) — r4 2, (1) 


or in integral equation form 


M(x) = fAe+u II? (0) P(z +u | x) du+2(a+t)P(w+t|a), 
(2) 


att 
where P(a+t|x)=exp | -[ A(u) aj. 


These equations allow for any age-dependent probability of 
division, but assume that bacteria, once produced, are in- 
dependent, so that their ‘lifetimes’ before division are not 
influenced either by other bacteria or by the previous values 
of lifetimes in their own genealogical history. 

The solution of (1) or (2) is unknown in general, but if the 
lifetime U, with frequency distribution A(x) P(w|0)du, is 
approximately represented as the resultant of a number of 
independent stages or phases Uj, U,...,U,, each with exponen- 
tial distribution of mean 1/A,, 1/As, ..., 1/A, respectively, and we 
consider merely the total number N(t) of individuals, beginning 
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with one individual aged zero, the Laplace transform of (2) may 
be written 


Lyy= MT (a Aye I à tl (1 $1] 


y y i-1 i 
where M (y) is the transform of II7, or 
vy y 
A (e) on ar S f (14) 1], 


Transforming back and noting that 1-4-)/[A;z 1 4- 1/A; 0[0t, we 
obtain an equivalent differential equation 


k 
T (143. 5) me - nia. (3) 
with initial conditions 
I(z)-z, (0/0)TI(z)-0 (1<i< k). 


From (3) soluble equations can be obtained at least for the 
moments. For example, in the case where all A; - kÀ, we obtain 
for the first two factorial moments of N, m, and Mg, SAY, 


[(1 + DJ(5A))* — 2] m, — 0, | 


4 
[(1 + D[(A))* — 2] mg = 2mi, ái 


where D z0[0t. In the particular case k=1 we are back to the 
simple birth process 


a 
(145 a) II, = II, (5) 
with complete solution 
TI, =ze-™/[1 — z(1 — e]. (6) 


From the more general equations (4) D. G. Kendall (1952) has 
found that, as t becomes large, m, ~ 21t exp (a; At) (22,), where 
aj, = k(2"* — 1) ~log, 2 for large k, and the coefficient of variation 
c y[m, ^ A, Vk, where A, ~ /21og, 2 for large k. From experi- 
mental data by Kelly and Rahn for Bacteriwm aerogenes, with 
a mean ‘lifetime’ of about half an hour and a coefficient of 
variation about this mean of 15-30%, a value of k about 30 
appears to provide an appropriate model in this case. 
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Mutation. As an example of the growth of a heterogeneous 
population, we will consider the development of mutant forms 
in bacterial populations. This problem is of particular interest 
when the mutants are resistant to an environment (e.g. a nutrient 
medium impregnated with bacteriophage) unfavourable to the 
original type. The study and comparison with observation of 
particular stochastic models is important for understanding the 
causal mechanism involved—for example, in contrasting theories 
of (i) genetic random mutation occurring independently of the 
environment, and (ii) active interaction of the environment 
with the organism to produce the new resistant types. 

'The model we shall examine will assume only two possible 
bacterial types, normal and mutant, with a normal bacterium 
dividing into two new normals, or into one normal and one 
mutant, and a mutant dividing always into two mutants (i.e. 
‘back’ mutation in this model is assumed negligible). We shall 
consider the growth prior to plating-out on a phage-impregnated 
medium, and assume that both normal and mutant types grow 
at equal rates (in the absence of mutations). We further assume 
that mutation occurs at the time of division, the chance of one 
of the two new bacteria after division being a mutant being p. 
Then equation (1) is replaced by 


PE) 6) TIPO- PE) 


[7 aT M(a 
+ p(T19(0) I1f9(0) — IP) + a, (7) 
( aml” 
PHP, Ao) (P0) - IP) + me 


ot 
Where TI9(z. (u), z,(u) jx), TI (es (u), zalu) | x) are the probability- 
generating functionals for one initial bacterium of normal and 
mutant type respectively (andqg=1—p). In the simple stochastic 
model where we put A(x) =A, if further we put all z, (u) =z, and 
all z(u) =z», these reduce to " 


eng wga [T0 
P. trop - TP} + App HP rry, 
r (8) 
eno 
onl? e ACIP? = II. | 


ot 
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As the second equation involves only IIj?(z,), we may adm De as 

for (5), and substitute in the first equation. Leaving Ti (z2) 

for the moment arbitrary (for reference later), and putting for 
ience 

convenience T —e-M, IIPeM— Dp, Pert 3p, 


or, 


we obtain 37 -Pdr Y, (9) 
with Y,(2,2,) 2 1/2,. From (9) we obtain 
1 
¥pQy—4-—¢[ nag, (10) 
"2 T 
1 
where Qy =exp bf, Ppd v) k 


In particular, putting 


r= TP /T=25/[1 —4,(1—7)], 
we find 


Yer? +0-ap[2—2] «Lena as. (11) 


For the marginal distribution of mut; 


ants, we may put z,=1, 
2522, and obtain 


ar 
TOI eda Pa (12) 
starting from one normal bacterium, or 
FE zT n 
5 o-a rer a C8 


starting from n norm 


al bacteria. In the usual case of p small, 
this is approximately 


1— -n 
Io)» - Flog [1 ex - 1j] , (14) 
where m — »p|T', or when also 7 is small, 
ml-—z n 
T)~[1 Fog (1-2) š (15) 


It will be seen finally that for n fairly large, a condition under 
which stochastic fluctuations in the growth of the normal 
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bacteria tend to be relatively small, this distribution is approxi- 


mately 
Ti*(2)~exp{m*=log (1-2)}. (16) 


This asymptotic distribution is rather remarkable in having no 
finite moments, though of course the more precise distribution 
(13) has. We should expect to arrive at the same asymptotic 
form under any reasonable hypothesis on the population growth 
of the normal type consistent with negligible fluctuation in its 
final size. For example, we may alternatively assume that the 
normal type grows ' deterministically ' with exponential increase. 
Confining our attention to the p.g.f. II(z) for the number of 
mutants, and using for convenience the ‘forward’ equation, 
we have (cf. § 3-5) 

a =nApe™(2— 1) T(z) ager 23270), (17) 
where it is assumed that A(z) —A for the mutant type and the 
deterministic growth rate for the normal type is also now taken 
to be A (on the previous assumptions it would be slightly less 
than A, due to the occasional transfer to mutant type). Putting 
m — npe, we obtain from (17) 


BUT au 18 
ma, =m(e—1) TT + 6-2 X (18) 
Solving the partial differential equation (18) by standard 
methods, we have the auxiliary equations 
dm — d; d II 
m zi—-z m(z-1)1I 
with two independent solutions 
(a) m(1—2)[z— constant, 
(b II(1— z)-ma—als = constant. 
The general solution is thus 
II(1 —2)-709 = y(m( —2)/2}, 
where, for £=0 or mg-—np, we have II =1, or 


{np 2e) 0 —2) "0-95. 
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This gives for t> 0, 
II(z) 2 (1 —z - npz[myna-3t, (19) 


which as np/m—0 also has the asymptotic form (16). In the 
form (19) the moments are still finite; we readily find the mean 


nàpte = Amt — mlog (nfn), (20) 
where n, is the number ne of normal bacteria. The variance is 
2m(e — 1) — Amt ~ 2mnj[n. (21) 
For comparison the mean and variance from (13) are found to be 
n(T— T»), 
n(T971— T1 + 27-2 — 2T»-1 4. T-4UTP — 139), J 


(22) 


with the same asymptotic values as above when p, T —0 but 
np|T —m. 

The limiting distribution (16) has been tabulated by Lea and 
Coulson (1949) for varying values of m. They have also shown 
that for m 4, say, a useful approximation is to take the quantity 


11-6 


rlm—logmy aes 20? (23) 


as a normal variate with zero mean and unit standard deviation, 
where ris the observed number of mutants. Methods of statistical 
estimation (see Chapter 8) applied to such observed numbers 
from parallel cultures enable estimates of the mutation rate to 
be obtained. A simple though not fully efficient estimate is ob- 
tained from the median of the observed distribution. However, 
the use of the upper quartile, while nominally even less efficient, 
has the practical advantage of being more insensitive to the 
phenomenon of phenotypic delay. 

This further complication is that the phenotypic appearance 
of the mutant type appears in some cases to be delayed a few 
generations, and it may be advisable to allow for this effect 
before the extent of agreement with observation is finally 
examined. The effect of smaller stochastic fluctuations in the 
reproduction of the mutant population should of course also be 
considered. This may be shown to halve the variance (21) in the 
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limiting case of deterministic growth for both normal and mutant 
populations, but the asymptotic distribution (16), for which the 
variance has become infinite, is not affected. 

With regard to the phenotypic delay, let us consider again the 
first of equations (8). This was solved first for arbitrary II, 
and we are at liberty to interpret it in the general solution (10) 
as the p.g.f. of the number of resistant bacteria (i.e. bacteria in 
which the mutation is manifesting itself phenotypically) at time 
t, given one bacterium just mutated at time 0. If we are inter- 
ested in the marginal distribution of resistant bacteria, we take 
TI, IIf? as functions merely of one further variable z, associated 
with the number of resistant bacteria, putting z; —1 in (10). 
Suppose, for example, the multiplication of the mutated bacteria 
is, like the normal type, assumed to be of simple Markovian type 
with growth rate A, but the chance that a mutated bacterium is 
immediately resistant is P and that if non-resistant, the chance 
of one of the two new mutant bacteria after its division being 
resistant is P, and so on. Then in place of IIf? we put 


PII) + (1— P) HG), 


where II/(z) is Tz|[1—2(1— T)] and T(z) is given by (12) with 
P in place of p. 

In the case of the general solution (10) (which assumes, how- 
ever, a simple stochastic multiplicative process for the normal 
bacteria), if we put z,=1, and regard IIf(z), IIf?(z) as p.g.f.'s 
for the resistant bacteria only, D. G. Kendall (1952) has esta- 
blished that as p, T — 0 but p[T remains constant, 

o =i 
IIf(z) > fı spen [1 — H6(z)] e?"A do) " (24) 
0 


and correspondingly, for n initial normal bacteria, the limit is 
this expression raised to the nth power. If, however, p and 
1j 1.— 0 with np and t remaining constant, the resulting distribu- 
tion starting from n initial bacteria tends to 


exp —npe™ "pt —TI9(2)] eia). (25) 
0 


When we substitute IIf(z) 2 Tz/[[1 —2(1— T)], these equations 
become equivalent to earlier results. We may, moreover, a$ 
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already noted for this special case, expect the second limit to 
be relevant for more general types of normal growth consistent 
with negligible fluctuations in the size of the normal bacterial 
population. 


4:32 Population genetics. Several brief references to 
genetical problems have occurred in earlier sections, for example, 
to inbreeding in § 2-2, to natural selection in § 2-3, and, in relation 
to bacterial populations, in the last section; the theory of genetic 
recombination was also mentioned in § 2-11. The importance of 
the theory of stochastic processes to genetical and evolutionary 
theory should therefore have already become apparent, and the 
main purpose of the present section is to remind the reader of 
this by a further example in the domain of population genetics. 
Mathematical work on the genetical theory of evolution by 
natural selection is largely due to R. A. Fisher, J. B. S. Haldane 
and Sewall Wright, their work forming as technical a body of 
research as that in statistical mechanics, say, and requiring as 
detailed a study. More recently, the relation of their work to 
stochastic process theory has begun to be examined, in some 
cases putting it on a broader or more complete basis and in others 
bringing out more clearly the underlying assumptions. 

We shall consider again a mutation problem as in the last 
section, but in a more purely genetic context. We suppose that 
there is a population of size N each individual of which carries 
a pair of genes AA, Aa or aa (of, Example 2 in 82-2). The 
number of A genes in the total population will be denoted by Ap 
and we write further AjJ[2N]— P. We shall assume that the 
bivariate process A,, N,is a Markov one in time t. This cannot be 
strictly true, for the exchange of genes will be related in in- 
dividual genealogical lines to generation times, which we have 
seen must depend on age-dependent birth- and death-rates, but 
over the entire population it is a more realistic assumption, 
except perhaps for short-term effects if generation times are 
strictly periodic. Any effect of other genes has of course been 
ignored. 

The detailed evolution of A, and N, will also depend on the 
phenotypic composition of the population at any instant, and its 
breeding characteristics. We can only hope to be able to neglect 


— 
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Such faotors in detail if we view the evolutionary changes in a 
rather broad manner. This means that only large changes com- 
pared with individual transitions can be considered, and we 
Shall in effect be specifying these changes by an asymptotio 
diffusion process, though one of a Markov type (i.e. of the general 
type in equation (4), $3:5, or its extension to multivariate 
processes) rather than of the simple additive type. A 
We have seen that the behaviour of a finite population can be 
fundamentally different according to whether its total size is 
expected to increase or decrease, and we may expect this to 
apply also to a population of mixed genetic constitution. In 
many situations it is perhaps reasonable to suppose that the 
total size N, is fairly restricted by the environment, and keeps 
approximately constant, but, as Feller (1951) has stressed, this 
is a drastic assumption which severely limits the class of process 
studied. It would be useful, as in the processes of the last section, 
to study the joint distribution of A, and N, under wider 
conditions. Under general mutation and selection changes this 
is, however, not easy, and we consider here the solution under 
the special but usual assumption of N constant, following 
a treatment due to Malécot (1952). 
As N,— N is constant, the Markov process is one in A, only, 
or equivalently in P,— 14,/N. The equation for Mq(0) S Efe’? 
is then (equation (2), $ 3:5, dropping ¢ in Y for any homogeneous 


case) 
3M0) _ 3) 1 
eMe ws, 2) ano. (1) 
0AP: | P. 
where ¥(0, P) = lim bogie 
At—0 


We now make the following simplifying assumptions: 

(i) For small At, there is a chance KN P,Q,At, where Q,=1 -P, 
of a single gene transition A >a, and similarly of a— A, due to 
the random shuffling of genes in the offspring of any mating. 
(Strictly speaking, these changes depend on the phenotypic 
male and female frequencies of the AA, Aa and aa gene pairs, 
even under random mating conditions, but it is in any case 
hardly possible to specify these changes more realistically under 
the artificial condition that XN, is constant.) 
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(ii) Selection is assumed to operate on the ratio P/Q, of A 
to a genes, changing its value deterministically (again, for N 
constant, it seems difficult to formulate this more realistically) 
by an amount cAt(P/Q,) in At; this implies a change in P, of 
oP,Q,At. 

(iii) There is a random mutation rate 4 from A to a, i.e. the 
chance of such a mutation in At is 2NP,uAt; similarly, the rate 
from a to A is v. 

Assumptions (ii) and (iii) give a change in the mean of P, of 
c EQ, - (vQ,— LP) At; assumption (i) gives no change in mean, 
but a variance 2«P,Q,At/N. The variance due to (iii) is 


FQ, + MP, AN. 

The third-cumulant contribution from (i) and (iii) is O(1/N?). 
Hence 
F0, 2) =O oP.Q,+Q,—pP] 

HIPPO + YQ e wP)\/N +0[07/N2], (2) 
The equation (1) thus becomes a diffusion equation if we assume 
we may neglect the terms of O(1/N?), so that 0 only appears 
explicitly in ¥ as a quadratic; equation (1) is, moreover, a partial 
differential equation of the second order in 2/00. If we assume 


that the solution of ( 1) has a limiting distribution, this must 
satisfy the equation 


v(o, a) M(0)—0. (3) 


For example, if the mutation rates v and y are Zero, we have 
the equation 


[c0 + 3k2]N] É = a M=0. (4) 


This has the solution Ae? + B. Moreover, from the full equation 
containing 0M,/ét, we see that 0.M,[0t — 0 when 8 — 0 or — 4Nolk. 
Hence for these values of 0, the limiting M(6) is equal to its 
initial value e?, so that 


A +B= l, Ae AN + B= e-Norlk, 


] — e-4Nerik 


whence M= paN (-1)*1, (5) 
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giving the complementary chances of extinction of the A or a 
genes in the absence of mutation. Malécot has shown further 
that this distribution is in fact the limiting solution of (1) and 
(2) for w and v zero. 

As a second fairly tractable case, we suppose c =0 and x, v 
are O(1/N); then expanding M,in powers of 0, we find the equation 
for m, = E{P3} 


dm, =ï 1 
T = —3v(m, —m,a) —pem,— eus (m, Msı) i (5s) » (6) 


and in particular, as m)=1, 


dm, 


m —v(m, — 1) — um, 
— grt 
whence m= e pre 9L y[(u 4- Y). 
nmm 


The equation (6) may be solved successively for ms. ms, ..., and 
it is then evident that m, has a limit as too for all s. This must 
therefore be given by the recurrence relation 


m[s — 1 - 4N (gu 4- v)|k-- O1N)] 
—m,j4[s—1-44Nv[k4 OQJN)], (7) 


which defines the continuous distribution for P, (independently 
f ; 
of r) f(p) dp oc piri(1— pe- dp. (8) 


This result may be obtained more simply from a general 
formula due to Wright 


1 m(p) 
ERNNCCONI (9) 
fip) agel ea á 
where m(p) is the change in mean per unit time, and o*(p) the 
change in variance. We put 

m(p)=v(1—p)—pp and c*(p)-i«paü —P)IN, 
and (8) follows. The formula (9) was, however, obtained merely 
by requiring the mean and variance of any stationary distribu- 
tion with probability density to remain constant, and the 
Solution (5) shows that it does not give the correct solution in 
all cases, 
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44 Epidemic models 

A further important application of the techniques developed 

in Chapter 3 is to the mathematical theory of epidemics. We 
shall illustrate this by examining one or two simple but typical 
stochastic models associated with transmission of infection from 
person to person. As in the case of population growth a large 
part of the literature has been devoted to ‘deterministic’ forms 
of the equations (see, for example, Kermack and McKendrick, 
1927-33; Soper, 1929).T Such discussions have been useful in 
often indicating sufficiently well the growth and spread of 
epidemics when numbers are large, but evidently may not be 
adequate when numbers are small and in particular in the early 
stages of an epidemic, so that it is essential to examine also the 
extent to which such studies are supported when more complete 
Stochastic representations are employed. 

Let us suppose that a population has at any time t a number 
S(t) of individuals susceptible to a certain disease, and a number 
I(t) of individuals actually infected. In problems for which re- 
covered individuals are only temporarily immune from a further 
attack, we should need to specify such individuals also, but we 
suppose recovered individuals are permanently immune, and are 
similar to isolated (or dead) individualsin not giving rise to new in- 
fections. We shall for the moment assume that S (but not Z) can be 
augmented by new susceptibles entering the population from out- 
side; for example, in the case of measles, which is one of the most 
convenient epidemic diseases available for study, children are 
constantly growing up into the critical age period. To be precise, 
we assume the following idealized scheme of random transitions 
(cf. §3-5), where the p.g.f. variable z corresponds to S and w to I: 


Type of transition | Rate | Operator 


zw - w? a 8*[0z0w 
wl ) 9[0w 
l-z v 1 


The equation for the p.g.f. IT(z, w) is correspondingly 


e en 
oT e Aut ww) t v). v(e-1). (1) 


+ See also, however, McKendrick (1926). 
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Differentiating this equation in turn with respect to w and z, 
and then putting z=w= 1, we obtain the two equations 


ae AE(SI) — pe, 
(2) 
ey 

== —AE(SI) +», 

ot 

where «= E(I), y=E{S}. In the deterministic model E(SJ) is 
replaced by zy, random variation in S and J being ignored. 

The logistic process. A special well-known case is obtained by 
writing ~=v=0, S=n—TI, the last condition corresponding to 
the assumption that all individuals not infected in some finite 
constant population are susceptible. This gives the deterministic 
model 


Or (3) 
X" Ax(n— x), 
with solution for the total number of infected up to time f, 
z —n[[1 4-779], @ 
and rate of infection 
a = jAn? sech? [An(t—to)], (5) 


m infection rate. 


where f coincides with the time of maximu: 
tained (either 


The corresponding stochastic equation may be ob 


by direct argument or from (1) by writing T(z, w)=2"1 (zw) 
and then putting z= 1) as 

all aS IT 

9L. aww- N| -Dgo 7 s ; (6) 


with II,(w)=w', say. Equivalently, in terms of the m.g£f. 
M (6), equation (6) becomes 


oM oM OM 7 
OM ae- m5 - as | (7) 


and from either equation, differentiating once with respect to 


wor Ó and then putting w=e? = 1, we easily obtain the equation 
(8) 


gi, = A(nm, — ma) 
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for the rate of change mm, of the mean m, = E(I), where m, is 
EX{I*}. It will be noticed that the non-linear nature of these 
models prevents a straightforward solution for the moments, 
though (8) allows m, to be calculated if m, is known. By writing 
(8) in the form 
m= A(nm, — m3) —Ac%, (9) 
where c is the variance of J, and comparing it with the deter- 
ministic equation (3), we see that the effect of the variance ciis 
to depress the initial rise of J on the average. The exact solution 
of (6) or (7) can in principle be obtained, being in fact a special 
case of the homogeneous birth process referred to in $3-2, but 
the detailed solution is rather complicated (see Bailey, 1950). 
Deterministic approximations. Returning to the deter- 
ministic form of the equations (2), we may reduce these to equa- 
tions for deviations about equilibrium, the latter corresponding 
to Yo =K/A, x — v|u, by writing s=x(1 +u), y= yo(1-- v), whence 


ð 
TA wu), 

op (10) 
TA — (wu 4- v - uv), 


where r=1/y, o=/(vA). Some information on the nature of 
solutions of these equations is obtained by considering the case 
wand v small. Neglecting uv and eliminating v, we then obtain 
for u the equation 

du ldu 1 


with solution vili is g a P i 
U=UpeH cost (0— J[1(rg) —1 /(402)]), (12) 

for suitable choice of time origin. The solution for v is then 
9 — s VB e-3 cos (Bt +) (0x yr &m), (13) 


where cos Y = — 3 Jf, 8=7/c. These solutions represent damped 
harmonics with period 27/0. In the case of measles, Soper 
(1929) took 7 as equivalent to an incubation period of a fort- 
night, and estimating v for London as 68-2 weeks, calculated 
a period of 73:7 weeks from (12), and a damping factor from 
peak to peak of 0-58. These figures for small oscillations are 
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comparatively insensitive to the oscillation amplitude, as may 
be ascertained either by direct arithmetic computation or by the 
development of the solution of the non-linear equations (7) as 
a power series in the solution of the linearized form. Up to the 
Second-order terms we write 

4, — 44 +AU? + yo Uy Vy + dog V2, 

1 21°21 a2 1"1 wie (14) 

v= 9, + by UF + gt v + batt, | 
where u,, v, are the linearized or first-order solutions. Then by 
straightforward investigation it is found that, if 


a,—(9—258)a;, b;—-(9— 28) by, 
aj-3428, aig=28-1, — a 
bi, =P—-2f2, big=3+38—-2f%, by.= 5-28. 
In cases where f is small, so that y~ 90°, this solution becomes 
approximately | 


then 


u=uU, 4- 3uge-"" cos 20t, (16) 


v=r,(1+}u); 
where u, is given by (12) and v, — —% vA e-He sin Ot. 

These models may be extended to cover various modifications 
and additions to make them more realistic or more applicable to 
other situations—for example, (i) when immunity after infection 
and recovery is only partial or temporary, (ii) when a seasonal 
effect is present, or (iii) when the non-infectious and infectious 
parts of the incubation period are more precisely considered. 
The device used in § 4:31 of introducing a number of substates 
before the final transition to a new state takes place is con- 
venient for modifying the lifetime distribution in any state in 
the stochastic model, though of course it increases the intract- 
ability of the equations. In the extreme case of a constant non- 
infectious period a immediately after infection, the deterministic 
form of (2) is readily seen to be 

8/0 — «(t a) y) - en. 
© (17) 


ae = —Ax(t—a) y(t) * v. 
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The equilibrium values of x and y are as before, and if D= 9[0t, 
we find 1 1 
(p* 2) emo, 
c, c 
I 1 
-Šg (D+ 7-7") uno, 
T T T 


If we suppose that the infectious period is short, we have 4 
and A large, such that /A remains as before. Thus 7 -> 0, and the 
equation for u then reduces to 


[(2+2) e2-p «zo. (18) 
The behaviour of u depends on the nature of the roots of 
(0+2) eD_D=0, 
c, 


For c fairly large, we have 


aD — —log[14- l/(c-D)] 


xd. 4 
= oD * 29 


with first approximation D— i|4(ac) — 0, second approximation 


1 a 
alle go 
which corresponds to the same type of solution as before with a 
in place of 7, and à damping term which has been halved (this 
may be confirmed more rigorously if required by taking the case 
of a finite number k of substates in the non-infectious part of 
the incubation period, and then letting k increase). 

We have stressed that these deterministic formulations, while 
of some value in indicating the structural tendencies of the 
process in large populations, may be misleading if not supple- 
mented by consideration of more complete stochastic models. 
This is so even for the simplest epidemic models such as the 
logistic process if it is remembered that owing to their non- 
linearity the equations are not independent of ‘scale’ even in 
their deterministic form. Thus the infectivity rate A in (1), to 


-a 
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be consistent with the numbers cited for London measles, has 
to be taken as 1/300,000. This is obviously too small except as 
an average of a highly variable chance of infection from children 
Scattered over the whole of London. It is more relevant to 
consider a comparatively homogeneous group such as a school, 
for which stochastic fluctuations are relatively larger. 

Contrasting stabilities in deterministic and stochastic models. 
However, a much more striking and important contrast between 
deterministic and stochastic epidemic models may be indicated. 
With the deterministic model associated with (1), the presence 
of the damping factor implies that an endemic stable equilibrium 
level should be reached; for infectious diseases such as measles 
this is not what is observed in practice. Modifications of the 
infectivity assumptions may reduce but do not eliminate this 
damping tendency, while seasonal changes in infectivity may 
be shown merely to induce regular annual oscillations about such 
an equilibrium level. 

It will be recalled that stochastically this equilibrium level is 
precisely the one that is not permanent, as we saw in the theory 
of population growth; there the ultimate situation was either 
a very large population or extinction. This phenomenon appears 
to give a real meaning to the idea of à threshold level of sus- 
ceptibles, below which a major epidemic will not occur. Consider, 
for example, an isolated susceptible population of size n, as in 
the logistic process, but where “+0, so that the infected in- 
dividuals are removed in due course. Equation (1) is then 
(19) 


e an 
OTT Aue ga) FI n) 


ot 
This situation can approximately represent (1) when v is small, 
at the stage when one or two infected persons are introduced 
into à population previously free of infection. At the beginning 
of any epidemic, S will decrease from n as I increases, and the 
chance of a new infection will therefore be somewhat less than 
if we suppose S remains at the value n. But with this modification, 
(19) becomes simply the birth-and-death process 

) ell 
OT An(u?—w) s+) 5 (20) 
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Hence if An <y the infection will fade out, possibly not until 
after a ‘minor outbreak’ but without causing a ‘major epidemic’. 
IfAn > y, there will be (see § 3-4) an approximate chance [ue] (An)} 
of the infection becoming extinct before it takes hold of the 
susceptible population, where j is the initial number of infected 
individuals. This chance tends to zero as n and/or j increase. 
(For a more exact investigation for finite n of the distribution 
of total size of an epidemic for the model (19), see Bailey (1953).) 

Returning to the problem of the continuous epidemiological 
history of an isolated population, we see that in contrast to the 
deterministic endemic state, the infection will (under the above 
assumptions) disappear when the number of susceptibles is low 
enough, and can only enter the population again from outside. 
In a group which is not completely isolated (like a boarding- 
school) a continually recurring series of epidemics (with no 
damping) can, however, occur by a "triggering-off' of each 
epidemic by fresh infection from outside whenever the number of 
susceptibles has reached a high enough level. 

An approximate assessment of the distribution of the 're- 
newal’ time of a major epidemic (under simplified assumptions) 
is as follows. Let the chance of one new infection from outside 
in £, t+ ôt be edt+0(dt), and of none, 1 —eót-- o(01). We suppose 
the susceptibles S are increasing at a much greater rate, and 
represent this increase by a deterministic rate v. We suppose for 
definiteness that the initial number of susceptibles at t=0 is 
negligible, so that the chance of ‘extinction’ of any new infection 
that enters is approximately L[(Avt) when Àvt» pw. For e small 
we neglect the delay in the growth of an epidemic (if any) from 
fresh infection, so that the chance of no epidemic up to time t is 


[1— edu +pedu/(Avu)]= exp| = |J H —u[(ÀAvu)] du) 
nlQv. 


t»u p|(Av) " Kav) 
yt env) 

-Q eem (s). n 

a Av} (21) 


The equivalent frequency distribution, in terms of T' 2 Avt/p, is 
(T1) Tent- (r= euj(àv)), (22) 


with a mode at T =1+1/,/r. The above treatment is rather rough, 
but illustrates sufficiently well the stochastic mechanism; a 
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more precise solution for the time of outbreak under the above 
assumptions may be obtained by means of the theory of the 
non-homogeneous birth-death-and-immigration process ($ 3:41) 
with birth-rate Avt. 

The distinction between ‘minor outbreaks’ (i.e. extinction 
after a small number of infection transmissions) and ‘major 
epidemics’ (i.e. spread of the infection through the susceptibles) 
will be less clear-cut for t in the neighbourhood of “/(Av), as minor 
outbreaks for AS/u approaching unity may be fairly prolonged, 
and major epidemics for AS/« not much larger than unity will be 
curtailed by the finite size of S; ast and S increase, however, the 
probable size of the epidemic will increase. 

This type of stochastic mechanism has been demonstrated by 
means of artificial epidemic series constructed with the aid of 
random numbers. As infections multiply rapidly when an 
epidemie is under way and may be few and far between in 
quiescent periods between epidemics, it is usually convenient to 
employ the type of construction used for the simple birth-and- 
death process (fig. 4, Chapter 3), and determine the random 
times at which events take place rather than the values of the 
stochastic variables at fixed times. Thus for the last model 
(reverting for greater convenience to a random entry of sus- 
ceptibles at average rate v) the interval before a new occurrence 
has exponential distribution with mean l[e--v4- AIS + pI], and 
the relative chances of (i) 11 -- 1, $5, (ii) S—S841,I—l, 
(iii) J+ 741, §+S—-1, (iv) I51— 1, 8 S aree: v: AIS :uI. In 
fig. 6 (p. 133) part of an artificial series of this type is shown. In 
this particular series various modifications were introduced in 
Order to make it more representative of real mea: 
"i boarding-school, with a 10 95 seasonal variation of infectivity, 
and the entry of new susceptibles, and occasional infected 
Children, restricted to the beginning of terms (see Bartlett, 195 3); 
the essentia] features were, however, similar. 

How far the observed quasi-periodic character of such epi- 
demies as measles over a large continuous area such as a town 
tends to be maintained by a stochastic mechanism o 
kind is still a matter of investigation. An examination of actual 
records confirms that infection may become temporarily absent 
from relatively large portions of an urban area; it can now, 


sles incidence in 


f the above 
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however, be transmitted from neighbouring districts, and a 
complete stochastic model must include such possible trans- 
missions. The extent of the effective ‘isolation’ of the various 
local groups or districts will obviously be a relevant factor, for 
with sufficient isolation each group would approximate to the 
case treated above. We can at least set up a general mathematical 
formalism for studying this rather complex problem. We 
replace the p.g.f. II of equation (1) by the functional IT,(z(r), w(r)), 
where r denotes the spatial coordinates of any individual; and 
we assume possible transitions: w(r)—-1 at rate jt, 1—z(r) at 
rate vdr, w(s)z(r)- w(s)w(r) at rate A, where A is a function 
A(r—s) of the separationt r—s. Then the principles of § 3-42 
lead immediately to the generalization of the ‘forward’ 
equation (1) 


ell 
oe Juco—wery | ae [re - n tae 
e, 


+] j AEs w(s) (wr) —2(r)) [axis ez(r)dsdr 


Jas dr. 
2 


2 


In the situation when infection is almost absent in an area and 
the number of susceptibles may be treated temporarily as 
uninfluenced by the number of infected individuals, the process 
degenerates to a multiplicative process in the sense of § 3-4, and 
the ‘backward’ equation appears a little simpler. Thus in place 
of (20) we may write for II,(r), for one initial infected individual 
atr, 


at 740 — The) fnis) a7 s) M(x) tns) - 1) ds, (24) 


where n(s) denotes the initial density} of susceptibles. In 
particular, equation (24) includes the equation for the mean 
density of infected individuals, 


mn -umir)  [nts)àr— s)m(s) ds. (25) 


T Cf. the distance function introduced, in & somewhat similar context, by 
Rapoport (1951). ó 
Not, of course, strictly a density function, as the initial number of 
susceptibles is finite. 
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This equation is an equation for the mean density m(r) at any 
arbitrary fixed point, given one initial individual at r, but with 
assumptions of homogeneity and isotropy for and A, it is also 
the equation for the mean density at r for one initial individual 
atany arbitrary point, say the origin (or by superposition for any 


~ 
a 


w 
e 


Number of susceptibles 


25 


Sept. Jan. Apr. Sept. Jan. Apr. Sept. Jan. 


Vig. 6. Extract from mock epidemic series simulating successivo measles 
outbreaks in a boarding school. The graph shows a major epidemic beginning 
in the autumn of the second year shown and ending in Tee eir v 
outbreak we the following autumn. The dotted lines indicate he 
plpa ponura We E of new entrants to the *school'; 


beginni : nber 
ng of terms, when there are a nur D 
tho arrows indicate dates when infection also entered. (Reprinted from 


Applied Statistics, 2 (1953), 62.) 


it in this case we may take 


arbitrary initial distribution). To solve 
iar of m(r) with respect to 


the bivariate Fourier transform JM (0, 05) 


T, and : 
nd so obtain " (26) 


M= gn ^t, 
where A is the transform of A(r). While it has been previously 
Stressed that the mean density is not to be taken as necessarily 
representative of the actual numbers,we may reasonably suppose 
that for large n the chance of extinetion is confined to an initial 
Period when the mean density is almost everywhere zero, and 
under such conditions the solution (26) can be quite informative. 
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Asymptotically, in the particular case of purely local infection, 
we may put At~At— vt(01 + 63), and (26) represents the Gaussian 
distribution solution of a diffusion equation, with multiplying 
term a. — nA — x. It is known (cf., for example, Fisher, 1937) that 
this solution implies a wave of propagation with velocity 
2 (nva). In the case when nà and y are about equal, however, 
we can no longer expect the mean density solution (26) to be 
particularly useful, and a further study of equation (24) is 
desirable. Thus Pı the chance of total extinction, may be ob- 
tained by putting w(u)=0 in (24). This gives, with the homo- 
geneity condition p,(r) =p, 


jy— i — p) Ap, (p, —1), (27) 
where A= A(0); whence 
9.7 pmo — Iinea y (28) 


The solution for the second-order ‘ product-density’ f,, or other 
higher-order fluctuation formulae, may also be obtained, though 
more conveniently from the ‘forward’ equation equivalent to 
(24) (for further details, see Bartlett, 1954), 
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Chapter 5 
LIMITING STOCHASTIC OPERATIONS 


5-1 Stochastic convergence 

So far in this book we have hardly needed to consider direct 
limiting operations on the random variables and functions them- 
selves, although of course we have encountered asymptotic and 
limiting properties for associated probabilities and probability 
distributions. Direct limiting operations on a stochastic process 
X(t) include differentiation and integration, the possible equi- 
valence of X(t) to series expansions, and so on. A full discussion 
of these problems is included in Moyal (M), and our purpose in 
the present chapter is merely to indicate these methods and 
ideas to an extent sufficient for most applications. 

In this first section we summarize the three main modes of 
Convergence of a sequence of random variables Ky, Xo, Aat 
a stochastic limit X, and define the particular mode of converg- 
ence mainly used in subsequent chapters (for a more detailed 
account, see Fréchet, 1937-8, Chapter 5). 

Convergence in probability (i.p.). X, converges to 
bability as «oo if, for any positive e, 

lim P{| X, -X |> 6}=0. 
n> 


Xin pro- 


(1) 


A necessary and sufficient condition for such convergence 1s 


that for any positive e and 7 there is an n such that 


P(|X,—-X,|»e)«* for all n, m> no (2) 


An alternative necessary and sufficient condition is that X,7%X 


Lp. if the distribution function (d.f.) G,(y) of Y,— X,- X 
tends to that of Y=0. This limiting d.f. of Y, is sometimes 
denoted by e(y), and is the proper function which is the integral 
5t the improper Dirac ó-function ó(y). Note that it is necessary 
but Dot sufficient for X, > X i.p. that the d.f. F, (x) of X, tends 
to F(a) uniformly at all points of continuity of F(x). l 

The limit X is unique in the sense that if X, X i.p. and 
n>Zip. then X =Z almost certainly (i.e. P{X —Z=0}=1). 
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Almost certain (a.c.) or strong convergence. A stronger criterion 
of convergence (implying convergence i.p. but not conversely) 
is of particular importance in the probability limit theorems 
known as the ‘laws of large numbers’. This is defined in terms 
of the entire sequence of random variables X}, X,, ..., Xp .... 
Regarding sucha sequence as a newrandom variable with realized 
value z,,25,...,2,,..., we may say that this realized sequence 
either does or does not converge in the ordinary sense to a limit z. 
If the probability that it does so is unity, then we say that 
X, — X almost certainly. This criterion is equivalent to what is 
called strong convergence in the theory of probability, defined 
by the condition that for any positive e 


lim P(| X,, — X | » for at least one mz n) — 0. (3) 
no 


The equivalent condition not depending explicitly on X is that 
for any positive e and 7, there is an n such that 


P(| X, — X,, | >e for at least one m2 n) <7 (4) 


for all n>n. As a.c. convergence depends on the simultaneous 
behaviour of X, for all n>n, it is obviously more difficult to 
handle, but the following sufficient criterion is useful. If 


XZ, E {|X -X |?) «co (5) 
for some p > 0, then X, — X a.c. An alternative condition is 
E,E(|X,4—X,|[e,]») «oo (where X, e, « oo). (6) 


Convergence ‘in the mean’ or ‘in mean square’ (m.s.). A third 
criterion we shall find particularly useful in stochastic process 
theory is analogous to the idea of ‘convergence in the mean’ in 
analysis, and is defined in general by the condition 


limZ(|X,-X|?-0 (p>), (7) 


though we shall only (unless otherwise stated) consider the case 
p= 2. We shall then sometimes write (7) in the notation employed 
in analysis: 

Lian, =X: (8) 


n> 
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However, owingtothe danger of confusion in the use of the phrase 
in the mean’ in statistics, we shall more often refer to (8) as 
convergence in mean square (or m.s. convergence). 

The necessary and sufficient condition for m.s. convergence 
corresponding to the condition (2) for convergence i.p., or (4) 
for a.c. convergence, is 


E(| X,, — X,, |} « € for all n, m> no. (9) 


It may have been noticed by the reader that we have not here 
attempted to justify the existence of the limiting random variable 
x, though the equivalent conditions for convergence stated 
re ely in terms of the sequence variables X,, will suggest that 

"is 1s possible. Moreover, it is easily shown that if X,—X, 
either a.c. or m.s., then X, — X i.p. (but not conversely); as the 
Lp. limit X is unique, so is the a.c. or m.s. limit. 

There is, however, no implication of a.c. convergence in m.s. 
Convergence, nor conversely. This may be illustrated by an 
example. Let X, = with probability1/n*, and 0 with probability 
1—1/n?. Then obviously y 

lim E(| X, —0|5) 1, 

no 
80 that, X, does not converge in m.s. to 0 (there is clearly no other 
Possible limit independent of n). But 


eo o ] 
P(X, » € for at least one n>n} € X P{X, >= Xm (10) 
n=Ne n=Ne 


which converges, so that 
lim P(X, » for at least one ? 2-0, 
No o 
E X, — 0 a.c. (The inequality in 
E w of probability for events 4; and ds. 837 
xclusive, namely, 
PEE os E} = P(E}  P(63- P(5s and £} « P{&\} + P(63) 
o X, 1s s./gequence 
=1 with pro- 


(10) follows from the addition 
not mutually 


oo suppose alternatively that Ky Kays 

b e ependent random variables such that X, 

ability 1/n. and 0 with probability 1—1 |n. Then 
lim B{| X,—9|7}=9% 


no 
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and Lim. X, —0. But 


lg — 1 1 
P{X,,=0 for every n2 no} (s ) s i) az 3) 1. m0, 
“0 Dv e 


whatever n, so that X,, does not converge a.c. to zero. 

As a.c. convergence and m.s. convergence imply convergence 
i.p. they imply automatically the uniform convergence of the 
d.f. F (x) to a limit F(x) at all points of continuity of F(x). 

Application to stochastic sums. If we encounter a random 
variable defined as 5 

X= 2g. EZ, a 1) 
u-u 

the question of its convergence naturally occurs. The simplicity 
of m.s. convergence arises from its equivalence to ordinary limit 
properties for the second moment (always assumed finite when 
we make use of m.s. convergence). Thus condition (9) is then 

(for X, real) equivalent to the condition 
lim E(X, X,,}= M x. 


n,m—-coo 


nm 742 0. (12) 
Applying this condition to an. where we assume at present 
Jur Y, real, we have 
n de 
lim X x ,g, E(Y, Y,) =p. (13) 
n,m-xo u=0 v=0 
A sufficient condition for (13) to be true is that X, | gu |/(2{V) 
should converge. In the particular case when the Y, are un- 
correlated with zero mean and the same variance c?, (13) 
obviously becomes 
lim X g= E (14) 
n>ou=0 
so that the convergence of the sum in (14) is all that is needed. 
Extension to random functions. We may extend these converg- 
ence definitions, as in analysis, to refer to functions, and consider 
the stochastic convergence of a random function X (t), where t has 
a continuous range, to X(r) as tr. For example, we say X(t) 
is m.s. continuous at t=7 if 


Li.m. X(t) 2 X(r), 


tr 


i.e, if vem E(| X(t) t) |?}=0. (18) 
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This condition may be written (for real X(t)) 
Jim. E(X (t) X(9)) e lim w(t, 8) =(7,7) = E(X*()- (16) 


Notice that if (15) is true, we must necessarily have (as 
| E(Z) |? < E(Z?) for any random variable Z) 
E(Li.m. X(t)) 2 £(X(r)) - lim E(X(0)), (17) 
tr tor 
so that the m.s. limiting operation and the expectation sign 
commute, 

In the case of a.c. convergence, the extens 
ment about ‘almost all’ realized functions x(t) as £T; and the 
kind of difficulties referred to in § 1-3 will arise. In particular, the 
stochastic convergence of X(t,) to X(r) for any arbitrary 
sequence £, —7, while the relevant property for investigating 
Convergence i.p. or m.s. of X(t) to X(T), is no longer sufficient 
for a.c. convergence. Moreover, continuity i.p. (or m.s.) at every 
point t in an entire interval (0, 7’) defines continuity i.p. (or m.s.) 
Over the interval, but this is not true for a.c. continuity. Àn 
example is the additive process with unit jumps—this is a.c. 
Continuous at every point, but the chance of at least one jump 
m any non-zero interval is greater than zero; it is a.C. continuity 
ONE an entire interval which is the more useful concept. A 
Sufficient condition for this is 


E(| X()-X(--|) «C^ 
for all t, +h in (0, T). 


ion implies a state- 


[ee (C,a> 0), (18) 


5-11 Stochastic differentiation and integr ation. We 
next extend these ideas to differentiation and integration. Thus 
(f) is said to be the m.s. differential coefficient of X(t) if 
im z | ee |=0, (1) 
n>0 h 
ples indicated in § 5-1 that 
s that s(t,s) has partial 
atter in the symmetric or 
at all points f, then these 
s, and it may be 


"i dit is easily shown from the princi 
Fa equivalent condition for this i 
derivatives Jot, 2y[0s, O*p[Gts (the 1 

Omplete sense) at i=s. If X(t) exists 
Partial derivatives exist on the whole line t= 
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shown further that they exist over the whole t, s plane. We then 
also have 


E(X()j- 2 {Xt}, E(X()X()- uta, 
E(X()X()- 2409. 2) 


ates 


By repeating the argument for X(t), we obtain the further 
conditions for Ë (t) to exist, and so on. If also X(t) is stationary, 
so that y(t, s) is a function only of t — s, it is necessarily differenti- 
able for all t if differentiable at one point. For stationary 
stochastic processes with 

p(t—s)=cos|t—s|, exp{—|t—s|}, exp{— (t—s)} 
(that such processes exist is shown in the next chapter), we see 
that the first and last arc differentiable m.s. indefinitely, whereas 
the second is not even differentiable once. It is, however, m.8- 
continuous, corresponding to the ordinary continuity at 7=0 
of exp{—|7 |). 

It is possible from (18) of § 5-1 to obtain sufficient: conditions 
for a.c. differentiability; it is found that X(t) is a.c. continuous 
if m.s. differentiable, a.c. differentiable once if m.s. differentiable 
twice, and so on. In particular, if X(t) is m.s. differentiable 
indefinitely, it is a.c. differentiable indefinitely. We may go 
further and define analytic random functions X(t) by taking t 
and hence X(t) as complex. We then define 


pt, 8) = B{X*(t) X(s)}, (3) 


where X* is the complex conjugate of X. X(t) is said to be m.s. 
analytic in some region R of the t complex plane if it can be 
expanded in the m.s. convergent Taylor series 


eo (n) 
X(t) = X(t- to)" Xt)/n (4) 


for t, to in R, where Xt) is the nth m.s. derivative of X(t) at to. 
The necessary and sufficient condition for (4) is that p(t, s) 38 
analytic in the corresponding region of both ¢ and s. Moreover, 
X(t) is now also a.c. differentiable to any order; it may be shown 
that the series for X(t) in (4) also converges a.c. to X(t), so that 
X(t) is also a.c. analytic. 
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It has been usual in the statistical theory of turbulence to 
assume differentiability to any order for the random velocity 
U(x) at à pointz of a fluid, but has not always been realized that 
this automatically implies corresponding restrictions on the 
covariance or correlation function y(x, y) - Z(U (2) U(y)); thus 
we have already seen that we could not have 


n(z, y) - otexp(- | z— v l} 


It is possible also to define the Riemann m.s. integral 
b 
U= fu) Xe) dv. (3) 


where it is assumed that ¢(w) is a real bounded function, piece- 
wise continuous in the ordinary sense. The existence of such an 
integral is established in the usual way as the limit of a sum Up 
obtained from a finite number of points Uy, ---»%n» and as U, is 
a random variable it may have à stochastic limit as the differ- 
ences u,—u,_;>0. The necessary and sufficient condition for 


U,>U m.s. is that 
b rb 
E(US- Í Í EOD © 


exists in the ordinary Riemann sense. Then also 
Burj- [^ [amy rE Ye» dudv 
if v-[ y(v) Y(v)dv 
a 


is similarly defined and exists. A Riemann-Stieltjes integr al 


y= f gar (7) 
may also be defined in the m.s. sense if and only if 
E(US- f i I * stu) d (9) dolu v) (8) 


exists as an ordinary Riemann-Stieltjes integral, where 


v(u, v) - E(Y(u) Y(v)}. 
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Another extension of (5) is the integral 
b 
v-| X(u)d®(u), (9) 


if the formal integral for E(U?) exists. 

These definitions may readily be extended to infinite ranges 
of integration, to complex-valued functions (ulu, v) being defined 
as in (3)), and also to random functions in more than one real 
variable. The equivalence of the m.s. property for X(u) and the 
corresponding ordinary property for p(u,v) in all these cases 
is primarily due to the linear character of differentiation and 
summation. It is also possible to define expressions like (5) and 
(7) as Lebesgue and Lebesgue-Stieltjes m.s. integrals, but the 
conditions (6) and (8) respectively, ifnow interpreted as Lebesgue 
integrals, are no longer sufficient, for the question of the m.$- 
measurability of X (u) in the sense of measure-theory also arises, 
and is not necessarily covered by the measurability of pus v): 
However, in practice the distinction between the two types of 
integral will hardly ever arise; a sufficient integrability condition 
usually adequate to cover this point is that the integral in (7) 
exists and has the same value in either the Riemann or Lebesgue 
sense if Y(u) is of m.s. bounded variation in (a,b) (a condition 
equivalent to v(u, v) being of bounded variation over the region 
a<u<b,a<v<b). 

While the subsequent use of stochastic integrals or other 
stochastic limiting operations will often remain formal, the 
aboveremarks may help toindicatethata complete mathematical 
basis is available. Thus the rigorous existence of integrals like 
(7) or (9) enables a more complete definition of the characteristic 
functional defined in § 1-31 to be made. Results which hold for 
non-limiting linear operations, or for non-stochastic limiting 
operations will usually continue to hold, though sometimes 
perhaps requiring some further regularity conditions. For 
example, it may be proved that if X(t) is a normal proces? 


limiting linear combinations of X(t), like X(t) or [xtd are 


normally distributed. A special case of the integral (7), important 
in the theory of stationary processes (Chapter 6), occurs whe? 
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AY(u) is uncorrelated with A'Y(u) for two disjoint intervals 
Au, A'u, so that 


jew) (ws), 
(nme) (usc) P 


Y(t) is then said to be an orthogonal process. The integral (8) 
then reduces to the single integral 


b 
zuf d*(u) de*(u). 


More generally, when (wu) and Y(u) are complex, equation (8) 
becomes 


b b 
E(UU*- i [ésto ét agro yo 0D 


and in particular for Y (w) orthogonal, i.e. 
E(AY (u) A'[Y*()]) = 
for Au, A'u disjoint, 
b 
guu f gregon" Yw} 
a ' 
she above integrals we have supposed 


on X(w) or Y (u); if 
u), we should have 


(12) 


Expansion theorems. In t 
U defined in terms of a given random functi 
we more generally write (wu, t) in place of $( 
à new random function 


y= f, des. a3) 


where R denotes the range of integration of u. An important 
general expansion theorem states that a converse result holds, 
namely, that if the product or bilinear moment pt, 8) of U (y) is 


expressible in the form 
wes)=| J d*(u.1) $l, 8) dr(u, 9), as) 
RJR 


where v(u,v) is a valid product-moment function, then there 
exists an expansion of U(!) in terms of a random function Y (u) 


of m.s. bounded variation given by (13), such that 
p(w, ?) = E{Y*(u) Yw) 
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(The integrals (13) and (14) are in general Lebesgue-Stieltjes 
integrals, but it has already been noted that if ¢(u,t) is piece- 
wise continuous in u equivalent Riemann-Stieltjes integrals 
exist.) The expansion (13) is, moreover, unique if the set of 
functions ġ(u, t) possesses appropriate completeness properties 


(this means that I V (v) ġ* (u, t) dv (u,v) =0 for all t implies 


f [vento ais no os 


Y(u) can then be uniquely obtained by an inversion of (13). 
Themostimportant case giving a unique expansion is Qu, t) — ei^. 
A special case of the general theorem occurs if Y (u) is orthogonal, 
so that (14) reduces to 


16s) | dtu, o dotu), (28) 
R 
where o?(u) = E( Y *(u) Y (u)) is anon-decreasing positive function 
of u. 


5-2 Stochastic linear difference and differential 
equations 


It is often convenient to study a stochastic process defined 
directly by a stochastic equation. Asa simple example consider 
the random sequence defined by the equation 


X,-pX, 44 Y, (1) 


where Y, is a sequence of uncorrelated variables with zero means 
and common variance g?. Solving (1) by iteration, we obtain 


r—u-1 
X,=p"X,+ X pV. (2) 
s=0 


if X,— X, at initial time u. As u>—oo, this gives a stable solu- 
tion independent of X, if | p | < 1. This solution is 


XX [n (3) 
s=0 
which defines a valid (m.s.) process X, with finite variance, 89 


X p” =1/(1—p°) «oo. 
8-0 
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A slightly more complicated case is 


X, aX, 4X, a7 Y, (4) 
Which is conveniently solved by putting 
rr — 
=a 
1 1 1 


ix 
= 1 e. 
ET =a 10-4077)? -EW -pEr Yan 
wh s 
mr E,= 1 +A is the usual displacemen 
eda ee finite differences and 44 and Hg 
ation E? - aE, -- b — 0. We thus obtain the gener: 
round pil pst 


X= Ang" t+ Bu -u4 
ra Ka 2, 13 — lt» 


t operator in the cal- 
are the roots of the 
al solution 

Yo 
X, 86, wt. 


wh 
ere A and B are obtained from the values X, 
btain 


Again. i 
gain, if | i, |, | i, | « 1, and we let u— 00, We o 
c „stl s+1 ^ 
pov K2 Y (5) 


A,= a = 
s-0 #1 Ko 


a vali « 
d process with variance 


o (ray oce A (6) 
s=0\ Amk T00) 


The further statistical properties of X, may also be studied 


eit! 

hs? from its solution or from its 

(1) E for the covariance of X, 

also ri multiplying by X, and averaging 
or (2) if E(X,) —0) 

wh w(r, 8) =pw(r—1,8) 

jon w(r,s) - p EES} 


and X, we obtain from 
(B{X,}= 0 for (3), and 


(s<r)s (7) 


In 
N ease (3), (X?) does not depend on s, and by squaring both 


si 
es of (1) and averaging, 
a= EX) epo ts 


wh 
ios w(r, s) - ge! - P» 


(8) 
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a result alternatively obtainable from (3). Similarly from (4) 

w(r,s)--aw(r—1,s)--bw(r—2,s)—0  (s«*), (9) 
whence w(r, s) 2 A(s) 417° + B(s) u*. 


By putting r=s, we must have, corresponding to the solution 
(5), 7% not depending on s, so that A(s) and B(s) do not depend 
on s. By squaring (4) and averaging, we obtain 


oX% =A+B, 


where o* is given by (6). To obtain another equation for A and 
B we take s=r—1, giving 


(1+6)w(r,r—1)+a0%=0 


Hence we find, putting a= — (4 + #2), b = fy tto; 


eX n, — 13) . B- eX nad — 1) 3 (10) 
(1 + Hio) Qs — He) (1+ Ha) (Ho — 13) 

The extension of these methods to higher-order linear difference 

equations of the type (4) is evident. The solutions are all of the 


type © 
X,—Xg9Y.. (11) 
s=0 


if we assume the process was ‘started up’ a long time ago. From 
the principles of the last section, this represents a well-defined 
solution (for E(Y,) — 0, E(Y, Y;) 2 0 when r +s) if 


X gs <0. a2) 

s=0 
Equations of the type (1) and (4) are called autoregressive 
equations, and it is often assumed further that the Y/s are 
entirely independent with the same distribution. With this 
further assumption (11) will be called a linear process. In this 
case let the joint distribution of X,,X,41,X,42,--- have cba- 
racteristic function with logarithm 


K fig, ipa ths, ...] — log Efexp [i($, X, ds X. a ME 
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Substitutin 
g from (11) and i i 
und va et ) and remembering that the Ys are in- 


Ras ; y : 
e kr thio» +ibedravt+} (9,70 for s<0), 


whe: ib) a 
Hs re Ky (id) is the cumulant function for Y. This is equivalent 


© 

K= X Kriibsdut Padueat e» (13) 

and, asi 

doa it does not depend on r, indicates that X, is a stationary 

and one In particular, we have the results for the second, third 

Pw urth cumulants (the latter two when they exist) connecting 
Aries Xp XL 


Ky =E{X,X,45}= OF, X IJuFurs» (14) 
Ky = E(X XX) m S) X. IuSussIure (15) 
Ky) = EX 
un = Z(X, X, X, Xu) - E(GX 4) PA rAr) 
~ AX, X43 E(X. Xii) = E{X,X, 40} EX rs XL) 
(16) 


eo 
= K,( £n = ft Ju+sJu+tJu+v 


T 
Be methods appropriate for continuous time are quite 
and gous, but now give rise to stochastic differential equations 
Satie En in place of differences and sums. There is one 
hain: complication to consider. Suppose for illustration we 
‘ire. l en the concrete problem of a sensitive torsional galvano- 
er with a damped oscillation of the type 

a(t) acit) + Ba(t) = 9- 
we consider a term Z(t) 
he effect of additional 


(17) 


Thig 

on Ne 9comes a stochastic equation if 

random right-hand side representing t 
forces, so that (17) then becomes 


H X(t) +až(t)+ PX) = Z0. 
Ow P 

itd er, if these random forces are due to th 
culeg S a of the moving part of the galvanomet 
orceg it is a more convenient assumption to su 
are of the nature of impulses, changing th 


(18) 


e random bom- 
er by air mole- 
ppose that the 
e velocity X(t) 
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discontinuously. Even if we interpret the differential coeffi- 
cients in the mean-square sense, it is easy to see that (18) is no 
longer an admissible equation, and must be replaced by an equa- 
tion representing the changes in velocity X(t). We use as before 
the notation df(t) for the change in f(t) in dt, and as Z(t) is replaced 
by the (m.s.) differential of the additive process Y(t) of the 
accumulated impulse effects, (18) becomes} 


dX (t)+aX(t)dt+PX(t)dt=dY(t). (19) 


If we find the solution of this equation formally, we have 
t pAit—r) — prxlt-v) 
X(t) = A(u) e-9 + B(u) eso, f —————- —dY(v) (20) 
u AmA 
where A, and A, are the roots of A?-- xA 4 // — 0, and A(u), Blu) 
are determined from the values of X(t), X(t) at t=u. We may 
verify from (20) that this solution has a valid m.s. differential 
coefficient X(t), such that (19) is satisfied, thus checking the 
consistency of (19) with this solution. Analogously to the differ- 
ence equation solution, if | e^: |, | e^: | <1, X(t) becomes indepen- 
dent of X(u), X(u) as u— — oo, and the solution becomes 
to gAyt—20) _ pAgt—w) 
xg-[ TU aro) (21) 
Again, the extension to higher-order linear equations is evident, 
and the solution (21) is more generally of the form 


t 
xo- f. i-o aro, (22) 


validly defined (for E(Y(u)) 2 0 and E(Y*(u)) — a?) if the piece- 
wise continuous function g(u) is such that 
Í g?(u) du « oo. (23) 
0 
For Y(u) a homogeneous additive process, the process (22) will 
be called a linear process (cf. equation (11)), and it is readily 
found (cf. equation (13)) that its characteristic functional 


C(0) - E exp [i | xow | 


+ It is often useful to think of df, even if it may include a non-zero jump, m 
defined by óf — df -- o(8t), but a strict interpretation of (19) involves integration 
over some non-zero time interval (see, for example, Moyal and Edwards, 1954). 
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is given by 

log C{} =|" Kli | * 900) amu) du (g(v)=0 for v<0), 
: (24) 

where Kyfid}=log E {exp [is f ‘ar |}. (25) 


F; ç 

iie (24) the completely stationary character of the linear 

mia (22) is apparent, and formulae parallel to (14), (15) and 
) may be deduced. In particular, 


ee oy | udu, (26) 


a result known as Campbell’s 


where o. is the variance from (25), 
f the ‘shot effect’ in 


"hensen, and first deduced in the context o: 
35 ines tubes, the independent potential impulses dY(u) due 
the ay electrons being damped at a later time t according to 
coefficient g(t— u) and giving a net result at time ¢ repre- 
Sented by (22). 
Pos also clear from the principles of 
ax the equations and solutions will ho 
either acad moments if Y(u) is an 
Sora an a completely additive process. This shows 
[Mee ga M(t, 8) = E(X(0) X(s)} is a function w(r) of T-l—8, 
is un neun particular that for X(/) in (21). As in this case X(t) 
Shih ¢ differentiable, we have w(t- s)[0t0s = — Qhw(r)or", 
aai therefore exists. Multiplying equation (19) by X(s) (s<t), 
averaging, we find that w(T) satisfies the equation 


w"(r) -aw' (7) + Bol) = 0 (r>0), 
Where w'(r) = dw(z)/dr, etc., whence 


w(r) = Aes + Bes (7>0)- 
he second condition fol- 


m.s. convergence that the 
ld in the m.s. sense as far 
orthogonal process 
that the 


(27) 


We re 


í quire also w(0) =T% w'(0)=0, t 


owing from 
SEQ) = 2B{X(t) X(0) - 9- 


T . 
his leads finally to 
e = Ayr 
si) -a [^ a] (r>0). (28) 

Ac 
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This result may alternatively be deduced from the general 
formula " 
ute ct | g(u) g(u 4-7) du. (29) 


Also by squaring both sides of (21) and averaging (equivalent 
to using formula (26)), we obtain 
o3 — o3 | (2a.f)). (30) 


Multivariate autoregressive and linear processes. For reference 
later we note briefly some of the relevant extensions to multi- 
variate or vector processes X(t). A difference equation of pth 
order may be written 


[L-A,Ez1 4... A, E77] X, =Y, (31) 


where A, are matrices, I is the unit matrix and X, and Y, are 
column vectors such that (a dash denoting a matrix transpose) 


1 (r=s), 


E(Y,Y3 - Wó,, E(Y,) — 0, à, — 
0 (res). 


In the particular case p — 1, (31) becomes 
[L--AEz!]X, - Y, (32) 


as a generalization of (1). Equation (32) gives for suitable A a 
well-defined stationary solution given symbolically by 


X,-[L- AEz 1]? Y,, (33) 
which is ineluded in the more general expression 
X= S G,Y,—a (34) 
s=0 


defined as a linear process if the Y, are independent for different 
v with common distribution. The cumulant function (13) now 


becomes for X,,X,41,--- with associated vector variables 
ido tec 
K= YX Ky(iG,Q; t+iGisidet--} (G,,=0 for u<0), (35) 
u=— 0 


where the cumulant function of Y is Ky(ip}. In particular, 


V, =E, Xh} X GWG (36) 
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where we require Vo= X G,WG,<. (37) 
u=-0 
From the equation (32) we obtain by multiplying to the right 
by X/_, and averaging (for the stationary case) 
Vi4AVi,=0 (520) 

vien V= Vie (7 AY Vo (38) 
a result consistent with (36), as for (32) G,- (- AY. It should 
be noticed that (32), while of the first order, includes equations 
of higher order by appropriate definition of the variables of the 
process. For example, the second-order difference equation (4) 
may be defined as a first-order difference equation in the two 
variables X,, Z, — X, 4, as (4) is equivalent to 


X, + aX, + bZ, -Y, 
Pe d =0. 


For independent Y,, this is equivalent to saying that linear pro- 
cesses resulting from higher-order autoregressive equations are 
vector linear Markov processes. 

For continuous time the analogue of (32) is 


dX(t) + RX() dt - Y) (39) 
with Stationary solution (for suitable R) 
X()- [ e Rt dY(u), (40) 
-0 
a special case of the linear process (for Y(u) additive) 
(41) 


X(t) = f aay- 


The characteristic functional s[i [Xo ae] of (41) 
has logarithm 


NES i feo déb(u 4 2) du (G(v)=0 for v< 0), 


Where Kylig} = log Bes [i [even] 


(42) 
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In particular, 
voezxoxuen-[ Ge WGtenas (43) 


where WzE(AY(u)A^Y'(u) (Au-1), 
and v(0)- | ” G(v) WG'(v) dv « co. (44) 
From (39) we have Bi 
Ie) +RV'(7)=0 (7>0), 
V'(r) 2 e-&* V(0), (45) 


a particular case of (43) with G(u) —e-*". As for discrete time, 
equation (39) includes higher-order differential equations by 
appropriate definition of variables. Thus for (19) we write 
Z(t)=X(t), and for additive Y(t) it becomes a vector linear 
Markov process in X(t), Z(t). 


5-21 Relations between direct stochastic equations 
and distribution equations. In §5-2 distributional pro- 
perties were often obtained either directly from the stochastic 
equation or deduced from the corresponding distributional 
solution. Both methods can be useful; in physical problems 
especially, where the velocity U(t)= X(t) as well as X(t) exists 
(as a continuous quantity in the m.s. sense, if any impulses 
acting on U(t) are additive with total variance proportional to t) 
the two approaches will often reinforce each other. 

As a simple example consider the (one-dimensional) motion 
of an unrestricted particle whose velocity is subjected tO 
numerous ‘small’ random impulses, so that the distribution of 
the velocity satisfies the simplest diffusion equation 


oflu), f(u) 
“a o a p 
with solution 


1 l(u—u 
fiu) Jeri exp| -3 z = l. (2) 


The variance of u from (2) is ¢%. From the stochastic equation 


dU(t)=dZ(t), 
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" ere Z(t) is a normal additive process (ie. a pure diffusion 
ó 2 ae 
cess with no non-zero transitions), we have 


U(t)=U(0)+ I E (3) 


xx - U(0) given as uy, the variance of U(t) in (3) is of course 
(a Eg m 0? is the variance inorement per unit time of Z(t) 
E t more generally true than (1) and (2), as Z(t) can in (3) 
àny homogeneous additive process with zero mean and finite 
variance), 
Consider next the joint distribution of U (t) and 


X(t)=X(0)+ i Uo) du. (4) 
s X(0) «a, the mean of X(t) is to + tot and its variance 
t (v 2 t 2 
E es MZ 
Een] 
m (5) 
t) and U(t) will be found to be 


Simi 
Fere the covariance of X( : 
9*. To deduce these results from the diffusion equation, we 


gie to extend this to the joint distribution of X(t) and U(t). 
W for the characteristic function of X(t) and U(t) we have 
Olh, yr) = E(gió* v"), 
AC = Ef (etA X +AT 
a 
nd for AX/At-> U (and AU independent of X), 


— 1) eid X gU], 


MY Def Bet. ] n " 
26 Lv (ios ag) aig] 
wh yAU_1 
ere Fiy, Ujelimz|[ Ai |e}: 
(of, ë " At>0 
tion quation (2), § 3-5). For the normal additive diffusion equa- 
Considered above we have Y — — 40°; whence 


a 
ec = cont $a (7) 
In 
terms of the density function fux) equation (7) in verts to 


of, 0f ati. (8) 
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Equation (7) may be solved by writing " & 8 
t=t, w=p~+¢t, Ó$'-6, à ?s5j^ P 
so that 
9logC — 
adco ci - dy, 
log C= -Zop t -poU (9^? (3) + rolig’) us) 
aid) + uilh + pt) -Zoly + Wo? + 9783], (9) 
representing a joint normal distribution with the first- and 
second-order moments agreeing with those previously found. 
It has been pointed out that these moment formulae are true 


more generally for any additive process Z(t). To solve the above 


distributional problem in this more general case, consider the 
characteristic functional 


C(O) =Blexp Í iUm) dow). 


From the principles of the previous section we easily find from 


(3), by substituting for U(u) in terms of Z(u) and reversing the 
order of integration, 


log C{O} — iujO (t) + f ratito —OG(u))du (@(0)=0), (10) 
where Klip} =log Blexp [iv [22e] : 
0 


To obtain the joint characteristic function of U(t) and X(t) — o 


we put O(u) — Vre(u — t) 4-uó (where e(u) 2 1, u2 0; 0, u< 0), and 
hence 


log Clg V) eim) [ Kato i-a) d) du. 


For Kz fiir} = — 407, this agrees of course with (9). 
A second well-known physical example is the Markov relation 
for the velocity U (t) of a particle with damping proportional to & 


dU(t)+aU(t)dt=dZ(0), (11) 


which may be handled by similar methods. I i ut 
V(t) =U (t) e@, we obtain pe eee 


dV (t) - e'dZ(t), 


VO=u+ | eran) 
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and obtain in place of (10) 


t 
log C{0} ZU e-u dO (u) + j^ x, if ga: dow) du. (12) 
Thus for X(t) — x, and U(t) we now obtain 

log O(g, y) = iue + (1-6) #12] 


+ r Kpli t+ 0 76777) o) du. 


In particular, for normal diffusion 

log C(9, y) = iue t+ (1. — 677) pla] 

| g? 2 T e —at' ?g?f 
*(v-1) a-em-e- f m om 


from which the variances and covariances of U(t) and X(t) may 
ly be deduced directly 


= deduced, The result (13) may alternative 
om the equation for O(¢, y) which is 


00,00 406. yoty? 14) 
l a Pap oy wes 
with equivalent equation for f (u, 2) 
f, aluf) 2 Of (15) 
Fug — zy 07V aus 


“ometimes referred to as the Fokker-Planck equation. 

9 Suppose now the motion is no longer unrestricted, but because 
w "reflecting barrier at x — (%o= 0,6 > 0), the motion of a particle 
Sper hits the barrier is reversed. This implies that, r 
it flee to any particle very neat the barrier and approac ing 
velocity U (t), there is just as likely to be one just reflected 


: i H TU 

onthe, and receding with velocity U(t). This gives the condition 
© density, f,(u, £) say, 

Th fiu b) =f -% 9)» 

‘ao of images referred to in g 3-1 may conveniently be 

Te bn obtain a solution satisfying (8) OF (15), and (16). ae 

tion of the particle means that the unrestricted densi y 

**) at (u, 2) is reinforced by an additional density emanating 


(16) 
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from an image point at x= 2b, with u reversed in sign (to allow 
for the reflection); the net density at (u, x) is thus 


fi(u, 2 | 0, uo) =f(u,zx| 0, Ug) fun | 2b, — uo), (17) 


where f(u, x) is the unrestricted solution of (8) or (15). To verify 
that (17) satisfies (16) it is sufficient to note that for either (8) or 
(15) f(u, x) has the form glu —uon(t), x — 2, — us£(t)), where the 


normal distribution g(w, z) is a function of a quadratic expression 
in w and z. This gives from (17) 


fiv, b| 0,149) - g(u— ug, b~ us) + 9(u-+ gm, -b+ u) 
=fı(— u,b | 0, uo), 


as reversing the sign of u merely interchanges the two terms in 
g in this equation. This method may be extended to give the 
solution for two reflecting barriers on either side of the origin, 
the number of reversals of sign of u, corresponding to the order 
in the infinite series of images on either side. (Notice, however, 
that result (17) with the second term subtracted does not in 
general give the solution for an absorbing barrier at b, for the 

appropriate condition filu, 5) «0 (u< 0) is not satisfied.) 
Obvious extensions of all the equations in this section are to 
motion in two or three dimensions. In physical problems further 
:8. gravitation) may also need to be intro- 


, for example, Chandrasekhar, 1943; Moyal, 1949). 
Distribution of X(t). The 


distribution of X(¢ 
teristic function is 


C(6, 9) — E(exp OX (t) - ib X (t4- hD, 


then we must have for X(t), X(t) the characteristic function 


dim [res [oxi +p Ete -x 


-lim (6 Vh, ylh). 
h-—0 
(18) 
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For example, for a stationary normal process with 
log O(0, $) = — 40°[6?-+ 9? + 20ge-™], 
formula (18) gives —4e* (0? V?), 
consistently with X normal, 
E(XX)-0, E{X}= —o*[0*(e3?)/0h o. 


Systems of individuals. As an application of some of the results 
of this section, consider a system of n individuals or particles 
Moving independently in some area or volume according to & 
Well-specified stochastic process. Such individuals may be 
Prevented from ultimate dispersion by restricting, e.g. reflecting, 
boundaries, so that the system becomes stationary. It is possible 
to demonstrate this final stationarity for, for example, the type 
of motion represented by equation (15); we allow also the boun- 
daries to recede to infinity and n correspondingly to increase (the 
detailed proof is omitted). The spatial distribution becomes 
uniform, and the velocity distribution that consistent with (15), 
or its extension to more than one dimension, as t-> %. The joint 
distribution of velocity and position, given an initial velocity 
and position, together with the stationary spatial and velocity 
distribution, permits the calculation of the probability Pa of 
;, Particle in a region 1 at time 0 being in à region 2 at time 1. 

‘or the two regions identical, P =Q, where Pz1-Q is some- 
times called the probability after-effect. Tn this last case, the mean 
vumber mQ being common to both regions, the joint p.g.f. for 
the numbers N, and N, of individuals in the region at 0 and t 


respectively may be seen to be 
—1)+m(za— 1) 4 mQG, — 1) 257 D} nm 


Comparison of this distribution with that holding for the 

t Mionary form of the immigration-emigration process shows 

8 2 they are identical. The differences between such an exact 

on ification as the above and the use of the immigration- 

migration process (§ 3-41) as an approximation are 

in a) the above model will not be Markovian in the numb 
Viduals, N; 


T1(2,, 29) = exp (m(z, 


er of 
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(ii) the mathematical form of Q will not be the exponential 
function e. 

The extent to which these differences become observationally 
important can be investigated. Of course if the mean density of 
individuals becomes large, the approximating assumption of 
independence retained above also breaks down. This last com- 
plication arises, for example, in statistical models of dense gases 
and liquids, but we shall not consider here the difficult problems 
arising in such a context, except to note that the motion of the 
entire system is then strietly only determined from the simul- 
taneous detailed positions (and velocities) of all the particles. 
Logically the system is still classifiable as a point process in the 
sense of § 3:42; in particular, the simultaneous probability of a 
particle in an element dr of position-velocity phase-space and 
another in ds will be of the form f,drds, where f, is a ‘ product- 
density’ of the second order. (For a detailed technical discussion 
of the molecular theory of fluids, see Green (1952).) 
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Chapter 6 
STATIONARY PROCESSES 


6:11 
Processes stationary to the second order 


i 
oe 1 we defined a stationary stochastic process as 
Mid Pamese oe function F,(r) for any set of times t, -++ tn 
UM seq ent only on the intervals t,—t, and independent of 
Bondi ini pnis of the t,; in other words, it is invariant under 
mites d: o ne t axis. We have already encountered such 
Dies X earlier chapters. Thus if a homogeneous Markov 
ends to a limiting distribution independent of initial 


Grammes per kilometre 
I 
è 


0 
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Distance along yarn (cm.) 
shows the variation in mass per unit 
h was kindly supplied by Mr G. A.R. 
ed Statistics, 2 (1953), 52.) 


Fi 

g. 7, ] 

ength a continuous time-series 
Oster ona & cotton yarn. (The grap’ 

à as been reprinted from Appli 
onditi = g 

in heno it is evident that the process has become stationary 

i above sense. Such an example was the * emigration-and- 


Immi > 
gration’ process of § 3:41 (equation (5)), with its limiting 


Margj ‘ 
ginal distribution a Poisson distribution. Another general 
as the linear 


Clas: > 
lieu can provide stationary processes w 
In view os in § 5-2. À . 
representin, the importance of stationary procesis in pre 
Observed g a kind of stochastic equilibrium often typical of 
and time N e.g. in turbulence, communication systems, 
Be 4) = eon in general (2 typical example 15 illustrated in 
» their properties are further developed in this chapter. 


actice, 
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We shall see that many of these properties will depend only on 
the structure of the process as specified by its first and second 
moments, always assumed finite (cf. $5-11), and hence will still 
hold even if the process is not completely stationary in the sense 
so far defined, but merely has first and second moments in- 
variant under translation of the t axis. Processes stationary only 
to this extent will be called stationary to the second order. More 
generally, a process with finite moments will be called stationary 
to the rth order if all the moments of order r or less are invariant 
under translation (in §3-4 we had an example of a process 
stationary merely to the first order, with constant mean but 
increasing variance). 

In the discussion of processes stationary at least to the second 
order, we shall usually find it convenient to take E(X(t)) 2 m — 0, 
and define the covariance function in general for complex X(t) as 


w(s—t) = E(X*(t) X (s)) 2 a?p(s — t), (1) 


where o?= E(X*(t) X(t)} and p(r) is called the autocorrelation 
function of X(t), with p(0)=1. For real X(t), p(T) is a real sym- 
metric function of 7; for complex X(t), p*(r)=p(—7). In the case 
of normal or Gaussian processes, i.e. processes whose distribution 
for any set t, ...,t, is multivariate normal, the only parameters 
of the distribution are the first- and second-order moments; it 
at once follows that a normal process, stationary to the second 
order, is completely stationary. The normal process is often 
theoretically useful in providing an example of a process with 
given admissible first and second moments, 

Consistency relations for product moments. Among the various 
consistency relations that must hold for the simultaneous dis- 
tribution F,(r), the consistency relations for the correlations 
among X;, X,,..., X, follow from the non-negative character of 
E{ZZ*}, where 

DHX +A Xa+... 4A, Kye 


This means that the Hermitian form 


E, EA) > 0 (2) 


for any valid product-moment function 4(t,s), an important 
condition specifying the class of admissible product-moment 
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functi 
ons. For processes stationary to the second order and with 


Zero mean, (2) becomes 
"n Sy, AAT) > 9- (3) 
cives vw that determinants formed from the coefficients of the 
(3) are non-negative, e.g. 
1 p(ts— ts) p(ts— 5) 
p(t5— ta) 1 p(ts— t) z0, 

et p(ts— i) p(ts— 15) 1 

ft —t,— t, — t —h, and X is real, 
- [1—-p(2h)]1 — 2p*(h) +p(2h)] > 0; 

Since p(2h) « 1, 
de, 1 — 2p%(h) + p(2h) > 0. 

e s 
nag DK relations could be deduced in this way, but we 
toni Bun an: equivalent comprehensive condition from the 
teore gative character of (3), since it at once follows from a 
fure by Bochner (1936-7, Chapter 16) that if p(7) is con- 
or all ee 7=0 (i.e. X(t) is m.s. continuous and p(r)is continuous 
we may write p(r) in the form 


pn» [enar 


Where " 

as 200) is a never-decreasing function with F(-0)=9 and 

tion n s. F(+00)=1, and thus has the character of a distribu- 
ction. From (4) it follows that p(t) for m.s. continuous 


Statio 
i nary processes has the mathematical form of the character- 
tion function. 


istic functi 
f Er t (Fourier transform) of à distribu 
) is real, we have p(7) real and symmetric, and hence 
(8) 


p(r)- af cosrodF (o). 


(4) 


The 
functi 
nction F, (o), say, for real processes need only be defined 


frön 
0t š v » 
9 co, with the relation Í 2dF(v)— P dF,(o)- 
0 


6:11 

Mental i The spectral function. The function F(w) is funda- 

“ail n the harmonic analysis of X(t), as we shall see later. It 
the stationary process 


e 
called the spectral function for 
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X(t), and for real processes F,(w) will be called the integrated 
‘spectrum’. 

In view of the importance of the result (4), a proof due M 
Loéve (1945) not assuming Bochner's theorem (and in effec 
giving an alternative proof of Bochner's theorem) is sid. bie 
take in the non-negative form (3) A, proportional to e^» an 
extend the sum to an integral (over a finite region for u and v), 


so that we have 1 
f e-#4s- p(s — t) dsdt > 0. i 


We define (in the m.s. sense) 


l fi? o. 
Hoyo zt |n Xu) du, 
so that 
E(H(w) H*(v)) -2nfj(w), say, 


=). Í ir eiu) p(u — v) dudv 
T J -r]-r 

$ Iz) 
= Pun he | iur d n 

f0- onerar 


since X(t) is stationary, this integral being, moreover, non- 
negative. Now define 


-l 
T: 7 ete) forirler. 
0 for |7|» T. 
Then 2nf p(w) af" Qs (T) e-*"r dr > 0. (2) 


à h i: t 
Multiply (2) by (1—|w|/Q)e/(27) and integrate with respe? 
to w from — Q to Q. We obtain 


n 7 ; = on fzgtr OTT car. 
a (: = 16 ato) etodw= zl. $T) d Qdr 
'The term multiplying e on the left-hand side is not negativ? 
and hence the integral has the character of a characters 
function (apart perhaps from a multiplying constant). The righ 
hand side converges uniformly in every finite t-interval 
rlt) as Qo (provided (t) is continuous, which follows fro 
the assumed continuity of p(t)). Hence ġp(t), for which 


$r(0)=p(0)=1, 
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is math i £ 

gems verae a characteristic function’ (using the limit 
$131). nr tiani function sequences, referred to in 
uniform limit d = same property it follows further that (1), the 
a ene e in every finite t-interval as T -> is also 

i ion’ = ired 

atten mim LEE As g(t) e p(), the required result 
. 1s co: sys 

[UT ar ag em real processes) is not only necessary 
correlati nt for the existence of a process with given auto- 

ation function. For let i 


X() = 0+, 


where Q " 
of Q MC aiam distribution function Fw), and 6 is independent 
niformly distributed over (0, 27). Then 
E(Xqy-o, E(X()X*0)-l 
E(X*(t) X(s)) - Bein} 
O 
-Í gie d (o) - p(5 — - 
Theor, us 
em 
rea] “aah of 86-1 was first given by Khintchine (1934) for 
of X(t) St bat hal Ae its relation with the harmonic analysis 
€ assum po eady implicit in earlier work by Wiener (1930). 
continuit: ma that p(r) is continuous, equivalent to m.s. 
orres yo x (t), will be made unless the contrary is stated. 
ponding to the representation of any distribution 


Ction 
by two components, the ‘discrete’ and ‘continuous’ 
d is the absolutely 


com 

pon : 

continuous. (strictly speaking, the secon 

ird dica component, but we shall assume that there is no 
gular component), we may write further 

(7), Qe [Le 7 D» (3) 


(4) 


fun, 


p(t) = 

Cla lor) t Leo 
p(t) - E, 8" Pe 

occurring at the points Y, 


(5) 


Where 
P, bein; 
ang 8 the discontinuities of P(o) 


pne [^ erento 


e , 
(w) is the spectral density- The inversio 
tical with t 


w) an, 
d f(w), while of course iden’ 


n formulae for 


Wher, 
hose relating 


T( 
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characteristicand distribution functions, are quoted for reference. 
For all points of continuity of F(w), we have 


. 1l (T eios — gio 
F(o,)—F(o,)= lim J f anid, 0) 
or, if dF(w) =f(w) do, 


t e " 
o= [^ ee p(r)ar, (7) 
In the case X (t) real the formula for F,(w) is 
2? sin Tw 8) 
E f p(t) z dr. ( 


There are analogous theorems in the case of discrete time (with 
the equidistant intervals taken as unity). Thus Wold (1938) has 
shown that for real X, Ps is the autocorrelation coefficient 
between X, and X,,, only if it can be expressed in the form 


fo [cos so dE, (w). (9) 
D 


More generally for complex X(t), we have 
p= |" ed Fw). (10) 


No continuity assumption for X, | arises of course for stationary 
Sequences. The inversion formulae in this case are 


+7 1l. zi n 7]eg-iso 

Ry EET po ] v T1) 
) 2m 92m ee Qut à —is Pas ( 
w 2 9 singt 
em 2 
Fw) == un (12) 
and when dF(w) =f(w) dw, 
1 2 ; 

flo) EC Ras (13) 


It should perhaps be noticed that 
extracted from a stationary process in 
observations at intermittent interv. 
dF(w) for the sequence does not c 
original process even over the rang 
zero for the original process outside 
with the ambiguity of period of a h. 


ifa stationary sequence i$ 
continuous time by taking 
als, the spectral function 
oincide with that for the 
e —7 to z, unless dF(w) is 
this range; this is associated 
armonic series observed only 
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at intervals, a higher order harmonic having the same obser- 
vational characteristics. 

Spectral moments. In the last chapter we saw that the m.s. 
differentiation properties of X(t) were associated with the 
differentiation properties of p(T). The connection between p(T) 
and the spectral function F(w) implies that there will also be 
associated properties of F(w). Thus for real X(t) we noted for 
stationary processes for which X(t) exists as a T5. differential 


Coefficient —. eras: ; 
E+) X(0) = E(X(0 X07) =P") (14) 
E{X(t) X(t—7)} = —07'"(7)> (15) 
where dashes denote differentiation with respect to T. Thus the 
autocorrelation function of X(t) is proportional to —p'(r), 

Which from (5) of §6-1 has the form 
- ? ot eos rod F., (v). (16) 

0 

The relevant theorem, due to Slutsky, states that if the second 


Moment of F, (w) is finite, then X(t) is m.s- differentiable. 
o dF, (v) is 


This result follows from the existence, when I, 


finite, of the functions 
P()=—["osinrwdF(w), O=- f? eteosroat(o) 
0 
vins p'(r) is continuous with p'(0) — and p" 
al t may be shown further (cf. Doob, 1944, p. 283) © 
def a.c. differentiable.¢ For let Y(t) be the stochastic 
fined by the m.s. derivative X(0), and consider 


(r) is continuous. 
) that X(t) is 
process 


i 
ZzX()-X(0—- Y(u) du. 
j 20 a.c. Hence 


Tt is 
readily shown that E(Z?)— 0. whence Z , 
: i vith 
i every realized function z(t) is absolutely continuous wi 
;. tential coefficient y(t), the latter exis 


ti x t). Since X(t) and hence Y(t) are stationary, the ea 
on ‘almost everywhere’ may be omitted, and v pm iid 

mon : 7) and F(w), we 

the g other relations between /( de Aib fo) 


o i " : 
vious one from (7) that if p(7) i$ ® espE- p] " 


exis 3 
fs. As an example, consider the case p(T) 

i ; i assumed here (cf. also 
Dooh, as a tacit ‘regularity’ condition on X(t) 


P. 536). 
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which we saw in $5-11 implied that X(t) was m.s. continuous, 
but not m.s. differentiable. We have p(r) absolutely integrable, 
and the corresponding spectral density is 


Hoag, | rendre oe, (17) 
However, the second moment of w does not exist, corresponding 
to the non-existence of X(t) for this process, 

An additive process Z(t), or, what is equivalent for correlational 
properties, an orthogonal process, may have non-zero stationary 
increments dZ(t) with autocorrelation function p(0)=1, p(r) - 0 
(T0). Such a process is not m.s. continuous, and has no proper 
spectrum, but its ‘spectrum’ may be thought of as a kind of 
limit with all ‘frequencies’ w having equal weight, i.e.asa uniform 
distribution with a range tending to infinity. It is the analogue 


of a completely random stationary sequence of independent 
variables in discrete time, for 


which the spectrum is a uniform 
distribution from —z to 7 (or 0 to z for f (w)). 


6:12 Stationary point processes 
densities. The stationary processes so far 
some kind of continuity, 


and covariance 
discussed have had 
apart from the increments of the ortho- 
gonal process just referred to. If, however, we have a point 
process N(t) in time (cf. § 3-42), such that dN(t)=1 or 0, such 
processes may be stationary but not orthogonal. In fact, simple 
renewal processes (with a suitable random starting time for the 


renewals, or alternatively if started up a long time ago) will 
constitute examples, of 


which the simple Poisson additive 
process is only a very special case. For such processes we have 
seen that E(dN (1) aN (r)} (t4 7) is of a smaller order o 
than EUAN (t)} = E(aN (yy, but, never 
a stationary covariance density ( 


f magnitude 
theless, we may define 
cf. equation (11), § 4-3) 

w(t—7) dtdr = cov (4N (t), dN(r)} = EXdN (t) dN (7) ~ [E{an (0)? 
(tT) 


such a density function w(t—7) being only zero for orthogonal 


increments. That is, we assume 


= [Ad (t=7), 
cilia [w(t—7) +A2] dtdr Pe. 
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sane H H 
ake eii s suppose the distribution of the time-interval 
e ccessive events in a simple renewal process is the 
n of example 2, § 2-11, namely (with x= 24), 


f(t)dt=4A%te™Mdt (0< 1«oo). 


This Te: 

i m ee ee 

rr Eo implies that the probability of an event 

"ue Aid med an event at t=0, but regardless of possible 
s in the meantime, is (formula (6), $ 3:3) 


Adt — Ae-*?dt, 


whence le H 
i ; letting t P 
is A. We have s s we see that the mean density B{dN (t)}/dt 


BN()4N(r)) - Adr[Adt- Aed] (>T) 


whence 
ALe- (t> T). (1) 


~ w(t—T)= — 

18 density i : 

at ed is negative, implying an *inhibition' by one event 

Possible m bens afterwards (this is why it was used as & 
odel in the theory of genetic recombination). Con- 


sistently wi 
num abs this idea, we should expect the varian 
vents occurring in any non-Zero interval At to be 


less t] 
ha : 
an the Poisson value AAt. We find in fact 


| all f X aN op — ANAD? 
t+At 


teat pert 
-f i w(u—o)dudo+ Í, Adu, 
E di 


the 

i jos 1 2:5 

e oe term in the last expression arising from the con- 
or u=v in the double integral. After substitution from 


and 
som ; 
e reduction we obtain 


(2) 


whieh : paucae 

or aan s than AAt for At > 0. 

im no Tomi point processes, the cov 

unction ae spectrum definable as for ® 

Order of m the sense of $ 6-11, as its value for 

Cover it B but an extension of the theory of §6-11 to 
ay be made (see Moyal (M)). We may express the 


ariance density w(t—-7) 


non-zero covariance 
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relation (4) of $6.1 more generally for the homogeneous 
integrated process 


Y= Í E di (3) 


as E(Y() Y*(r)- lim. of (s ? = *) dF(w), (4) 


iw —iw 
and such a relation will exist also for a process such as 
t 
Y()-N() -x=f dN(t) — At, (8) 
0 


of the type considered above, even although N(t) is not differ- 
entiable. 


62 Generalized harmonic analysis 

There is an important analysis of any stationary process X(t) 
into orthogonal components, having an intimate relation with 
the properties of the correlation and covariance functions dis- 
cussed in § 6-11. This follows from the general expansion theorem 


quoted at the end of § 5-11, this theorem indicating the existence 
of a harmonic analysis of X (t) in the form 


x()- J 7 etedZ(o) (1) 


where Z(w) is an orthogonal process and the integral in (1) is 
interpreted in the m.s. sense. If we write V(w) = E{| Zw) |3 
then V(w) is a never-decreasing function with 

E{| X(t) |?}=02= V(co)— V(—oo), 
so that for V(—co) =0, V(oo) 2 o?. Then 


E(X*() X(t4-7)) - z Pale etui IZ (y) az*to)) 


=f ered V (w), (2) 


so that V(w)=o%F(w). Note that while V(w) is bounded, Z(^) 
need only be of m.s. bounded variation. 

The expansion theorem quoted in $511 is most elegantly 
established by means of Hilbert Space representations of 
stationary processes, but in view of the close relation of (1) tO 
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Physical harmonic analysis and the theory of linear ‘filters’, it 
may be useful to sketch a more direct heuristic treatment of 
Blanc-Lapierre and Fortet (see Lévy, P- 125; cf. also the proof 
of Khintchine's theorem given in $ 6-11). 

Consider the linear transformation or ‘filter’ operating on 


X(t), giving 


Y(t)= f ? xq-u)g(u)du. 0) 
To see the effect of. g(u), suppose X(u) can in fact be written in 
the form (1); then ; 
Y(t)= f 7 odzo) [ gv) e?" dv 
= f METHOTZON (4) 
where uo) f” go) etd (o) i, (5) 
ES of each 


Say, is called the ‘gain’ of the filter, the amplitude 
gain’ of the aes 

dlu) being multiplied by V and the phase increased by 9- If » 

Pow take (formally) h(w) = 1 in (91, 2) and 0 elsewhere, then ( 


Would give " 
yo- | eitedZ(o); 


th 
© corresponding g(v) from (5) is 
1 Feis- give 
ma 
becomes 
jp. L[^ f m 
or Jo w 


Whence ( 9 ) 


n function h(w) just taken is disontimons, jb s po 
ati ith i ili pb 
ew. Patible with integrability of qe) v limit, leading to the 


Semis by a suitable passage t 


r gE yqos de, (©) 


Wg ; 1 
i etd Z (o) — —].i.m. iv 


Or in 9, 27 po J-T 
Particular when 1—0 


» p jode Ee e m 
dZ(o) - Lim. e w 
ü 27 To -T 
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We have been led to the formula (6) starting from (1), but if we 
now define Z(w) by (6), it is not difficult to show that 

(i) Y(t; 94,05) and F(t; ws, w4) are uncorrelated for any non- 
overlapping intervals (w4, 9), (#3, %4), and hence that etd Z(w) 
and in particular Z(w) represents an orthogonal process; and 
further, . 

(ii) X(t) is identical with the sum of all such non-overlapping 
components Y(t), whence (1) results. It may also be noted that 
if X(t) is completely stationary, so also is each of its com- 
ponents of equation (6). 

The integrated form of the inversion integral (7) should be 
compared with an integrated form of (1) which covers also the 
processes of § 6-12, viz. 


Y= Í ‘X(u)du=l.im. Cz ‘) dzoy (8) 


f— J -2 


We note also the harmonic analysis for processes X, in discrete 
time, 


E- f J elo dZ(w), (9) 


corresponding to equation (10) of §6-11. 

A further theorem associated with the harmonic expansion 
(1) is that, if X(t) is defined by an expansion of the form (1); 
X(t) is stationary to the second order if and only if Z(w) is ortho- 


gonal (it is assumed B{Z(w)}=0). This follows at once from the 
covariance equation 


E(X*(7) X(t} = | : F eee E(dZ*()dZ(), CO 


for a necessary and sufficient condition for the covariance to be 
of the required form w(t—7) is that each term given by the 
harmonic decomposition of the integral on the right-hand side 
has this property, whence E(AZ*(u)AZ(v))-0 for disjoint 
intervals Au, Av—the orthogonal property. 

Discrete and continuous spectra. The discrete and (absolutely ) 
continuous components of F(w) (see equations (3)-(5), $011) 
imply a similar analysis of X(t). For V(w) the jump or saltus at 
w is (see Cramér, 1937, p. 24) 


T 1fT : 
dim op} C ""w(r)dr. an 
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We shoul 
: d ex ; 
hr da correspondingly, at each such point o, of 


A =Z( 1 ite 
n Wn t 0) —Z(w,—0)= i om a 
n ( n ) Li.m. FT € » X(t) dt, (12) 
a ergodic proper which w! wW 
p ty i 
in § 6-21 ( E which ill follo froma result to be obtained 


m EI 1 di 
4-1 gg. 078 (13) 


gives the j P 
new st eee in Z(w) at w= 0), by our regarding eo X(t) as 2 
y process Y(t). Assuming this result for the moment, 


We may write 
X(t)- Xs A Us T i eistdZa(»), 
Where p ^ 
A " 
the es ie =0 (nk), and dE (Z, Zi} [de is proportional i 
al density component f (v). 


e value of the integral in 
of stationary processes, 


the probability average 
in this 


6:21 
AS usd ergodic property. Th 
E (with men crucial one in the theory 
Xe} x (0) 50) only if 4 —0 can 
equation, iei veste by means of the ti 
d eory dites value of A readily follows fro 
ine’s origi y mentioned; here, however, We 1 
mic method of proof (1934). 
8 : wr 

en A exists — Aud for A before the limit 1s taken, Ur; S8- 
EP, V) e E( Urar- Ue 7^ 2 
(0), $51). Now by 

the right we d 


m the Hilbert space 
indicate Khint- 


aTa 
ey ©, uni 
Valuating ES rmly for all V > 0 (cf. equation 
> V} by expanding the term on 


3 
T, Pe 0 —8 
AT*(T yy 
T (T 
S adi v) dudv 


cf [poner] 


TN vel P. [olu — 9) 4 p(o— 0] dude) ; 
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where D denotes a range of integration — T — V to — T and T to 
T+V. Making use of equation (4), $ 6:1, we find further that 


&T,Vj- "m | jm Ie + e ^c) cos? (To + $V) 


m) C) cos (T'o 4-3 vo) aro) , (2) 


an expression which tends to zero uniformly for all V > 0. This 
follows for bounded V from the initial factor; for arbitrarily 


large V it follows from an auxiliary mathematical lemma due to 
Khintchine, that if in 


17 [^ Wo; DT.) dF) 


Y is a real continuous function of w such that 
(i) || is bounded for all w, T,, Ty, ..., 
(i) Y>C as w+ 0 for any fixed SU ess 
(iii) V0 as T,,7,,...— co, uniformly in |o|28» 0, then 


> C(F(03-) — /(0—)) as T, Ty, ...—co. In the expression (2), 
C —0, and hence £(T', V) +0. 


It follows that the m.s. limit A exists. Only under certain 
conditions, however, is A=0= E(X (t)}. For 


g? [T [T 
E(| Us =F JI 


— f » (ree) ar. 


which from Khintchine's lemma tends to e*(F(04- ) — F(0—)) 
and is only zero provided F(w) has no discontinuity at w=0- 
From (11) of the previous section, this condition is equivalent to 


d 1 [F7 
m gr |." mdr-o- (3) 

We have defined A above as the m.s. limit of the symmetrical 
time-average of X(t), from — T to T; but of course, as X(t) i$ 


stationary, the time-average from 0 to T' will also tend to the 
same limit. 
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63 Processes with continuous spectra 
Tn equation (22), § 5-2, we defined the linear process 


X= | ! guido), (1) 


Where F(u) was additive, and from its characteristic functional 
Saw that X(t) was completely stationary. (We also defined a 
linear process for discrete time with integration replaced by 
Summation, but, as elsewhere in this chapter, the appropriate 
changes in the formulae for stationary sequences are usually 
evident.) If only second-order moment properties are being 
Considered, it is convenient to allow Y (u) and g(u) in (1) to be 
Possibly complex, and the condition that X(t) is well-defined as 
* m.s, limit is then that its second moment 


ta e | gg) du «co. e 
i 0 
he autocovariance function w(r) is generalized to 
co 3 
wes g*(u) g(u-- 7) du. (g(u) =0 for u«0. (9 
Thi d i 
his last formula shows at once, if we write 
o 4 
hw) =f ewoglv) dv, e 
th AP 
at the Spectral density exists and is given by 
f(v)= DN ep g*(u) gu +7) ui] i 
o 2g ] -o -0 
o? (5) 
" - FF gh (o) h*(w). 
This formula remains tre if i) is orthogonal Po d 
Of e Y additive, and if g(u) extends over m diee E lower 
limit) se the integral in (2) converges his m r1 random 
> hi : roper ies 
functions € correlational and quan T diede obtained by 


of this kind are in fact inclu 


e ‘filter’ of 
" transformations of a stationary process (the ‘f 
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§ 6-2). In equation (3) of § 6-2 we had defined (with suitable change 
of notation for the random functions) 


X=LUW}= f^ qt-u) U(u)du, (6) 


and found that the linear operator L{...} defined by (6) was 


equivalent to an extra factor h(w) in the harmonic analysis of 
X(t) compared with U(t), i.e. 


U= f 7e dZ(o), X()- i ^ eit h(w) dZ(w). 


X(t) is necessarily stationary to the second order like U(t), as 


h(w)dZ(w) has similar orthogonal properties to dZ(w). It is 
assumed that o% is finite, i.e. 


i NO A¥(w) dE{Z(w) Z*(»)) zi |A(w) *dV(w) «co. (7) 
We have the further results 


E(U*() X(t4- 7) — | " é""M)dV(o), (8) 


E(X*(0) X(t4-7)) = | 7E Mo) (o) dY (o), (9) 


and if dV (0) = 0%; fv) do =v(w) do, 


then from (9) — o%&fx(w) =h(w) h*(w) v(u). (10) 


differential operator $(d[dt) = ó(D) 
a linear difference operator y(E,) = yr(eP 
(10) then gives formally, 


9f xlo) — v(o)]| dio) [2, (11) 


a result easily seen to be valid if $ (iw) does not vanish for any 
real w. Examples referred to in $5-2 were: 


(i) eP—p, (ii) D+ qeD +b, (iii) D+aD+p, 


6:3 
a PROCESSES WITH CONTINUOUS SPECTRA 175 ` 
an E PAP 
e (by implication) the Markov continuous time case (iv) D+ 4 
(iv), formula (11) gives l 


1 v(w) e 1 
fx(w is 
xlo) TZP noy pc o? (12) 


int 
he orthogonal case. Similarly we find for (iii) 

oe — 
| Fx) = sok (B — 93) + wa? (13) 

in t 
oe he orthogonal case. The corresponding spectra for (i) and (ii) 

Panis to in Chapter 9. 

"^ rends which now arises is how general such a representa- 
Station (3) is for the covariance function. If the spectrum of a 
nary process is assumed to be (absolutely) continuous, 


a 
ndwe write ——— quas) ek f(o), (14) 
then w(t) = fea eT a (o) a*(w) dw, 
which, from Parseval’s theorem, 
= [7 sento ds (15) 
where Blu) = Jim Js) f E gi? a (o) do. (16) 


) by putting g(u) - Ae), 


E T $5 d 
Equation (15) is identified with (3 
] expansion theorem quoted 


MEE ae (by appeal to the genera 
$5-11) the existence of an expansion 


X= I * gt udY os 


Ww. s 
here Y (u) is orthogonal. The representation (17)is thus general 
ctral density f(v), but it 


ar uy stationary process with spe 

with Po follow that an expansion of the form (i) is possible 

Sho (u) orthogonal, for this requires farther that g(w) can be 

the n so that g(u)=0 for u «0. This condition is given m a 
orem by Paley and Wiener (1934, Theorem XII) that we 


must have 
[A [og f(a) | do» oo, 
m E 


-o 


(17) 


(18) 
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To obtain g(u) (or f(u)) from a(w), satisfying the condition 
g(u) = 0 for u « 0, we choose «(z) to have no singularities or zeros 
in the lower half-plane when regarded as a function of the 
complex variable z (the condition of no zeros avoids singularities 
in 1/a(z), and is required for the use made of a(z) in Chapter 7). 
For example, if 


ee 
LO, aa 
c 
we take a(w) = Jen) (cri) 
giving Blu)=oye" (u> 0). 


More generally when f(w) is of the form 


OTT w- (o—u£) 
zd (C real, m> 2n, F (u,) » 0), 
Il, (079) (o— o7) 


we may take 


a9) - C TL (oo) i W-o). (19) 


' As one example where (18) is not satisfied, we note the admis- 
sible process X(t) for which 


fem Tae, pee. (20) 


This process is analytic (see § 5-11) 
next chapter, it is thus not surpri: 
sented by a formula like (1), which 
a non-deterministic process, 
d Y (t) arising as time proceeds, 


can regard (1) as defining a process X(t 
a H tl a z 
ministic ‘to the second order’ ca 


» ie. linear transformations of 
X(u) for u<t do not enable the increm 
ents d Y. 
be deduced. Ee 


; and, as we shall see in the 


6:31 Further examples of Stationary processes, Some 
further examples of stationary processes liable to arise in prae- 
tice will be considered from the point of view of their correlational 


and spectral properties. In contrast with the class of processes of 
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$63, a classical Fourier series has a discrete, spectrum, and 
constitutes an example of a stationary process if the phases of 
the various harmonic components are independently and 
uniformly distributed over (0, 27). To note the simplest case, 


X(t) =A cos (v, t+ ®) (1) 
gives E(X*(t) X(t-+7)} = 305 00807, 


if © is uniformly distributed over (0, 27), and E{A}=0. 

For the Markov chain of equation (5), §3-41, recalled at the 
beginning of the present chapter, we had the conditional p.g.f., 
given X(t)=n at t=0, 

I) =[1+ Te- D] exp pe- 1) (1- P). 
where 7'=e-#t, Hence 


BAX (t) | n) = [0112)/02].-1 
-aT4w1-T)|u, (2) 


showing that the regression of X(t) on X(0) is linear and that the 


autocorrelation function is T—e-^ (£20). (Notice that this 
form of the autocorrelation function, contrary to an opinion 
sometimes held, does not necessarily hold for all stationary 
Markov processes—counter-examples are easily constructed.) 

_ An example of some interest as providing a mixed spectrum 
18 à renewal process involving two states, these being labelled 
for convenience as — 1 and +1 and thus giving E(X(0) oif the 
Probability of either state is 4. The process is not Markovian in 
the simple sense as we suppose the chance of a transition from 
either state to the other in t,t+At is g(r) At+o(At), depending 
On its present lifetime 7 in the state it is occupying. The chance 
Q(x) of no transition in time v, measured from the last transition, 


is then qu) -exo|- f aoa) (3) 


The theory of the number of transitions oceurring in any in- 
terval r was developed in $2-11. We have, if the probability of 
r transitions is p,(7) (r=0, 1, -. ot 1, 2, ...,a8 in §2-11), and 


P(A,7)= X p.71), 
r=0 
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then P(A, 7T)=14+(A- 1) [ atu. Adu 
(equation (13), § 2-11), where 


glu, A) =f |" Aflu—y) gly, A)dy, 


f(u) here being —@Q(u)/eu. The correlation coefficient is easily 
seen to depend on the numbers of odd and even transitions; 
in facts D) pir) pir) pir)... =P 1,7). (4) 
Two possibilities arise: 
(a) the integral in (3) diverges, Q(co)— 0, and the lifetime 
distribution exists as a continuous distribution. To take a 
specific case, let gr 


o» (5) 


which tends to / for large 7, but is equivalent to 27 for small 7, 


so that transitions are inhibited by previous transitions. Then 
from (3) we find 


q(r)- 


Q(u) 2 (1-- pu)e-, — f(u) =p2ue-m, 
so that this case is the one treated at the end of § 2:11 and again 
in §6-12. Comparing (4) with equation (19), $2-11, we have 
p(T)=e [cosut -sinpr] | (r5 0). (6) 
The spectral density is found by inversion to be 
1 8 
ft) 5 Im (7) 


(b) if the integral in (3) converges, there is a finite chance of 
no further transitions ever taking place. For example, let 


= Lu 
gn n ay (8) 
hat "qir)dr- P 
so tha fran a= ae as u>. 


Then Q(oo) 2 e. The chances of no transition, one transition, 
two transitions, etc., as 7 — oo are therefore 


e, (1—e)e3, (lee, ..., 
and there is a positive contribution to p(co) given by 


e- (1 —e)e 4 (1—6e71)?e1— ... —e1f(2 — e), (9) 
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Thi i 

-= leads to a discrete component in the spectrum at zero 

— 4 E = $ A discrete component with non-zero frequency 
sim: ise i i inui 

oe ^» arly arise if Q(w) had a discontinuity at some 


6'4 Complete stationarity 
vise go correlational and spectral properties of stationary 
i: € epend merely on their second-order moments, we 
fin oan various examples of completely stationary processes, 
e icri Markov chain, the normal stationary process and 
oem emt linear process. Tt may be asked whether there is 
Uh as Pier alternative condition for complete stationarity, 
Ks, e Fourier relation required for the second-order 
Por in the case of second-order stationarity. This latter 
n was most simply expressed in terms of the Fourier 
expansion of X(t), for if ; 


X= [ dZ), (1) 


then i ; Z 
hen it was noted in § 6-2 that Z(») must be orthogonal, i.e. 

BE(AZ(»)AZ*(w)-0 (Ao, Aw’ non-overlapping). (2) 
sqi: TO contributions to the total mean 
to: re E(| X(t) |?) only arise from w —w'. However, this is in 
Un T of the double integral involving dZ(w), dZ*(w') and for 

al processes, for which it may be seen that we require 
— (o) - Z(-9), 

alone becomes w + w'=0. This 
complete stationarity case. 
lof the real process X(t) is 


From (2) it is seen that non-ze 


t tt * 

i condition in terms of Z(v) 

ien may be extended to the 
ppose the characteristic functiona 


ct) E[exv [i * xww |): (3) 
X(t) dO(0) of two functions 


invariant; the linear trans- 
uitable regularity conditions 
formation 


P : 

ani the ‘scalar product’ | 

Ka dO (t) is invariant, C(0] is 

= Penge represented by (1) is (fors 
(t)) thus counteracted by the linear trans: 


$t [ * edo), (4) 
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whence the characteristic functional of Z(w) is 


cto «$e e [exp [ [^ az). (5) 


If X(t) is completely stationary, the properties of X(t) are 
unaltered by writing t+7 for t. Making this change in (1), we 
see that c(4) is unaltered if we replace dZ(w) by e’dZ(w), or 
equivalently, if we replace (wv) in (5) by ¢(w)e'™. In terms of 
the higher moments of Z(w) of order r, if we assume that these 
are finite, it implies that non-zero contributions can only arise 
in the hyperplane Obie" of (6) 
this being the appropriate extension of the condition w+’ =0. 

In terms of real functions, we may write the real process 
X(t) as 


X() =f" [eos to dU (o) + sin tod V(w)], ‘1) 


where U(w)=Z(w)—Z(—w), V(w) 2 i[Z(o) + Z(—w)], 
so that degeneracy conditions on U(w), V(w) can be obtained 
from those on Z(w). 

A class of completely stationary processes. If X(t) is a real 
process stationary to the second order with the representation(1), 
and Z(w) is additive as well as orthogonal, is X (t) completely 
stationary? By Z(w) additive is meant that U(w), V(w) is à 
bivariate additive process with cumulant function 


Kie, p; w) = Í “dK (a,b; o), (8) 


say. With this additivity assumption, the cumulant 
racteristic) functional for X(t) is 


log C(O} -f ak (i feos uodO(u), ifsin vwdO(v); v) . (9) 


Consider first the distribution of X(t) at a single time-instant £, 
determined by its cumulant function 


(log cha- 


E dK (i0 costo, i0 sin tu: w). (10) 

0 

This is independent of t (for all 6) if « 
K(a, B)  L(o* + f?) = Lila 4 ifl [a — ip], (11) 
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and with this assumption (9) becomes 


i ; at(—[emaownfe-*d00), (12) 


which is invariant to translation of the t-axis. 

Notice that if it were assumed further that U(w), V(w) were 
independent of each other, then it would follow further that 
Z(w) and hence X(t) were normal processes, since the only solu- 
tion of (11) satisfying also K(a, B) - Kile) + K,(f) is $07(a? + £?). 
However, the completely stationary process (12) need not be 
normal, arising in general from an additive process U(w), V(v) 
with a distribution function isotropic in the two variables 
(cf. § 6-51). 

If we assumed that X(t) is normal, then it follows from the 
linear relation between X(t) and Zw) that Zw) is normal (for 


stationarity to the second order it is easily shown from (7) that 


E{U%(w)} =E{V2(w)} and E(U (o) V(oy}=9 


and thus U(w), V(w) is a normal isotropic bivariate additive 
process). In turbulence theory it has been asked if a converse 
is true, that if Z(w) is additive, is X(t) and hence Z(w) necessarily 
normal? The above more general class of completely stationary 
process shows that this does not follow from these conditions 
alone, even if it is assumed that Z(w) has no fixed discontinuities, 
A the spectrum of X(t) is continuous. However, itis reasonable 
in this physical context to assume also that X(t) is ergodic in 
regard to the measurement of its autocovariance. This would 


imply that the process 
Y(t) = X(t) X(t+7) —w(r) 


i T 
is ergodic, i.e. lim. L[ Y(dt=0, (13) 
To 0 
an equivalent condition for which is seen to be 
$ mtr ^ 1 4) 
im zx Wt, u) dtdu — 0. ( 
EX ik ( 


Here W(t, u) is the covariance function of Y(t) and isa function 
of t—u if X (t) is completely stationary, but not necessarily Es 
X(t) is only given as stationary to the second order. It may be 
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shown, even with this weaker condition, that (14) cannot be 
satisfied unless the additive process Z(w) is normal. It then 
follows that X(t) is normal and completely stationary. (A rigor- 
ous proof of this last result has been obtained by J. Eklind.) 


6:41 Recurrence times for completely stationary 
processes. In §3-3 the distributional theory of recurrence and 
passage times was developed, but only in the case of ‘renewal’ 
processes, which included Markov processes, was a very complete 
theory practicable. Recurrence theory is especially of practical 
interest for completely stationary processes, even if they are 
not of the ‘renewal’ type, for they will in general exhibit con- 
tinual recurrence, this phenomenon being linked with ergodicity 
properties. We shall consider stationary processes for which the 
ergodicity property holds in the most complete sense (not just 
for the time average of X(t) or of X(t) X(t--7)), by which is 
meant that all probability distributions characterizing a process 
X(t) are generated by any single realization x(t). In terms of the 
characteristic functional (equation (3), $6-4), defined, say, for 


dO(f) — 0 outside an arbitrary range (0,7), this will be obtained 
from the time-average 


lim 5, I “exp n naow || dt. 


This is of course a strong ergodic property, but may reasonably 
be assumed for many physical applications—for example, in 
the theory of molecular density fluctuations. As first noted by 
Smoluchowski, the extent to which the irreversibility represented 
by the Second Law of Thermodynamics continues to hold is 
dependent on the order of magnitude of the recurrence times and 
on the corresponding length of time-scale one ig 
consider. 

Even with these assumptions it only seems possible to give 
general formulae of a practicable kind for mean recurrence 
times. One powerful but simple method originally used by 
Smoluchowski (1915) is available when the recurring state S has 
a non-zero probability of occurrence at any instant. 

It will be convenient to refer to the number of separate times 
a state is occupied in a ‘long’ interval of time T, on the under- 


prepared to 
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standing that all derived formulae will involve only ratios of 
such numbers, and hold in the limit as T — oo. We then denote, 
in the case of discrete time (or intermittent observation), the 
frequency of the specified state S being occupied continuously 
for an interval length k by N,- Similarly, we denote by M, the 
frequency for the composite state ‘not S". Then on our assump- 
tions we necessarily have 

N,+2N,+3N3+--- 

P(S) - v VEN ESN, + - M, + 2M, + Bt...” (1) 


P(S|S)- 2L 3 (2) 


_ Mt Nt Nyt ee (3) 

CMM, M+...’ 
Where P(S |S) denotes the conditional probability of S, given 
S at the preceding instant, and formula (3) follows from the 
equality in frequency of excursions from and to state S. If we 
now define the lifetime t, in state S as the total continuous M- 
terval in that state, we have by definition for the mean lifetime 
NEN. 


y 
EL CIEN 4 
TzEQ-T. ING. 1-PRG|8) (4) 


where 7 is the interval between consecutive time-instants. 
. Similarly, if the recurrence time 0, is the total continuous 
Interval out of state S, then 
M, 2M pl -PH (5) 
0,=E{0,}=7 M, M e * P» 


To obtain the equivalent formulae as the time becomes strictly 
Continuous (i.e. 7 — 0), we assume that the limiting probability 
P(not.S | S) is of the form Ar +0(7) (cf. the case of Markov chains 
53:3), so that formula (4) becomes in the limit 
T, - MA. (6) 
Formula (5) continues to hold; in the particular case when S 
becomes a rare state, sothat 1 — P(S) ^ 1 it becomes 0, ~ T,|P(S). 
Application to the theory of density fluctuations. To illustrate 
this last formula we quote Smoluchowski's original results in 
the fluctuation theory of molecular concentrations. For & 
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spherical volume of radius a, and independent motion of the 
molecules, the value A may be calculated by standard arguments 
from the kinetic theory of gases, and is found to be 


rai (Se 
a 


i 
zs) wre), 
where the state S denotes n molecules in the volume, v is the 
average number, and the expression in the square root has the 
usual thermodynamical interpretation. The distribution of n, 
in general Poissonian, is approximated for v large by the normal 


law 
1 T | l(n-vy 
AJ(2mv) ^ PICS uw [d 
We thus finally obtain, for n/y~1, 


naf m M l(n—vy? 
e. os) ex[5 i ). 


This formula is remarkable in its tremendous and rapid change 
for given n/v as the size of volume changes. Thus for 1 % fluctua- 
tions in air molecules with density 3 x 10'*/c.c. and temperature 
T —300* K. (cf. Smoluchowski) 


P(n)~ 


a=1x10-cm., @, of order 10-1! sec., 
a=3x10-5em., ©; of order 109 sec., 
a=5x10cm., ©, of order 1058 sec, 


In spite of the essential reversibility of stationary processes, 
these results show how an abnormal deviation from the mean 
will be so long in recurring that a temporary impression of 
irreversibility is conveyed; a process may observationally appear 
reversible or irreversible according as the recurrence time of the 
initial state is short or long compared with the time for which 
the system is under observation. 

Extension of Smoluchowski's formulae. We may derive a useful 
extension of Smoluchowski's mean-value formulae. We define 

(i) the ‘further lifetime’ t of state S, given S at an arbitrary 
instant; 


(ii) the "farther recurrence time’ 0, of state S, given ‘not 8’ 
at an arbitrary instant. 
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Then by definition, for discrete time, 
N+ (142) Not (L+2+3) Mt. (7) 


T,= E{t.}=T 


N,+2N,+3N3+--- 
0,=E (0, Mit (+2) M, c (14243) My (8) 
M, 4 2M, + 3M 4 .. i 
However, we may define also the complete p.g.f.’s for 4 and tz, 
, 2NŅ +204 
Mi= 2 = 
L N +N, +.. (r=1); 
mz = Mat Gg) Mat 


N+ 2Ng+ ++ 
EE NENEAe] 7 zN, +N +. 
1-z| M-42N,*..| 1-2 NV 2S. 


(9) 
d Ilp for the 


or m -2)H =2(1—- T4). 


A similar relation obviously holds between IIa ani 
recurrence times 0, and 6,, i.e. 

6,(1—2) II 2 2( — Mh). (10) 
the transitions of the process, 
of the state at previous 
are stochastically equi- 
if we put Iz= II7, 


In the case of a Markov chain, 

given S at any instant, are independent 

Instants, so that the variables t, and f; 

valent. This immediately yields from (9), 
z 

= 1 

Worm. 9s 


à result which could of course be obtained more directly. It 


oe perhaps be stressed that the same result does not hold 
ne 1» as 0, and 0, are not in general stochastically equivalent, 
do n for a Markov process. In other words, a Markov process 
ut es not retain its Markovian character under grouping of its 
B ates (as we have noted earlier), and ‘not S? cannot therefore 

© treated as a single state like S, except in the trivial case of 


only two states in all. 
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When the interval 7 (taken for convenience as unity in (9), 
(10) and (11)) tends to zero, (9) and (10) become 


VAM) -1- M), (12) 
YOM p(y) - 1— Map), (13) 


and (11) becomes the exponential distribution. It is of interest 
to observe that the converse to (11) is also true and the relation 
(12) can lead to the exponential distribution for M, only if 
t and t; are stochastically equivalent. For let M4 =1/(1 + Y7); 
then M7(y) is also 1/(1 + Y7). 

If in (13) we do impose the condition that 0, and 0, are stoch- 
astically equivalent, we obtain the exponential distribution for 
AM) and Myr). This may be used in the recurrence theory of 
finite Markov chains (§3-32) to obtain the condition that the 


recurrence distribution should reduce to this distribution, Denote 
the original transition-matrix by 


"A 


where it should be noted that the vector Y, defined in § 3-32 as 
the initial probability vector for the remaining states, is pro- 
portional to b for 6,, and for 0, to the asymptotic distribution of 
the individual ‘not S’ states given by the column latent vector 
of the zero latent root of R (in view of the ergodic assumption, 
there is only one zero root of R). Hence the required condition is 


Sar ere) 


whence b must be a latent vector of C, with corres: 


ponding latent 
root —4=x'b/a. A simple example is 
R-/-2i 2 4 
l1; -4 1 
ii z-s 


for which Mgl) - Mz) =1/(1 +44). 
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ee infinity of states. When the probability of S 
abron : rictly zero, it is not very convenient to obtain the 
Bn an e formulae for recurrence by direct passage from 
"pes rupi asthe concept of lifetime in a state breaks down; 
bs on a ms e case of a single variable X with continuous range, 
"n ced by the rate of change X of X, assuming this to exist 
iste er sense, sO that the simultaneous probability 
fwill s ai (E, y) d£ dy of X, X exists. As the process isstationary 
babili ot depend on t. Thus for a given positive slope the pro- 
ity of X —z in the time interval t, t+ t as t-> 0 is 


ND 
ally the density in time 


(14) 


and hence, for any slope, we have form: 


and : 
an expected recurrence time 


0,= i [ntes 3) dy. 
justified if f (E, y) is con- 


(15) 


(TI 
1e formula (14) may be rigorously 


ti " 

inuous in £, 7 and | f(x-k ax, 7) dy converges uniformly with 
| (see Rice, 1944-5). 
hich X and X 
independent, 


Tes s 
[om to ain some interval — |o; | $4 < EA 
lite T example consider a normal process for w 
» being uncorrelated for a stationary process, 


i 1 £A a 
f.) gr, P -i AT , 


wh , 
ere we know from the general representation 


X(t)= fre dZ(o) X(0- [iram 


th E 
i: ei-ai| od F (o). 
Now po 3x 
1] 1[2? nT) ye 72 ex Eia 
[be ntm 2ci 


80 that m 
0 = = exp | z? 
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where g, is the standard deviation of the spectral distribution 
of X. For a normal Markov process g, is infinite and X does not 


exist, but the methods indicated at the end of $3-3 may be used 
to confirm that in this case €, is strictly zero. 


As an extension of equation (16), the formula for the mean 


recurrence time may be developed from the more general expan- 
sion for f(x, 7) 


QW ^ BM «, 1 Lu 32 
onl x ( z) (a) talaa] lara } 
where for a completely stationary process Koy, = E{X2(t) X(t)} 
(for E{X(t)}=0),=0; ka 2 E(X*(t) X(t)) 20, etc. We obtain by 
straightforward integration, retaining terms up to the fourth 
order, an average density of crossing the point X (t) 2» given by 


refiz 3 z) , Yao (5 6x? ) 


gros $ 
7 0i 6 vi cy 24 \ oi 


Yiz € Yz (x? Yol. la? 
ud uw (5 1) ew {-3 3, (17) 
Yao = H{X?}/03, Yio7 E(X3/01 — 3, 


Yu-FE(XX*](m,08), y, — E(X*X2)(0202) 1, 
Yo 7 E(X501 — 3. 


where 


In particular, if the marginal distribution of X is normal (as 
appears true if X denotes velocity in turbulent motion), 


—y4970, 
and (17) reduces to re 


1 of yi © Yo (x? 2 
= 1442 22 _ Yol] lax 
WON Rea d (5 1) 24 {°*P 739 (18) 


For the average density of zeros, we put further z — 0, 


65 Multivariate and multidimensional stationary pro- 
cesses 


The extension of the correlation theory of stationary processes 
to multivariate processes does not present any particular new 
difficulties. The structure of the theory is evident if we relate the 


6:5 
VECTOR AND FIELD PROCESSES 189 


t 
m of the (column) vector process X(t) 2 (X. 40) with that of 
single process, considered for arbitrary Ài, 


X(t, A) 282 X(0) 2 2i; X.(0- 
Thi $e. 
E representation is adequate for second-order moment pro- 
ies if A is complex, allowing the possibility of separating the 


disti : 
iri terms w;;(r) and w;,(7) in the autocovariance of X(t, A). 
erms are not identical, as, even for real processes X(t) 


wy(T)=F{X;(t) X jt +7)}; w;(T) zE(X;) X,(t-+7)}, 

it is convenient 
sed vector or 
ate values to 


beta ual =0. In the subsequent notation, 
iati ps vectors and matrices to let a transpo 
the =i denoted by a dash, take on the conjug 

ansposed vector or matrix (e.g. We now write 


A e (AL...) 
With this convention, we now define the cov 
UN V(r) = E(X(t-7) X'(0)- (1) 
e general form of Khintchine’s theorem for complex X (t), 


w(r,X) e ELX (+r, A) X*(5,3)— i ? eedW(u,M. — (3) 


quation (1) of $6:2, 


ariance matrix 


B : 
ut since also we may write, from e 
X()- | etodZ(w), 
-0 


X(,3)-MX()- | ? quA dZ(0), 


w) for arbitrary 2, are all 
ctor orthogonal process, 


(4) 


(3) 


wh P 
bog each item of Z(w), and also A'Z( 
ogonal processes and hence Z(w) à ve 


We obtain 
vo=f gir» dW (w). 
H E 
mos v3) =2°V A= | eiod[a’W(w) Al, 
and the Hermitian form A’W(v) must consequently from (2) 


b 
v the never-decreasing function W(w,a). This property of 
w) was first given by Cramér (1940). 


(5) 
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The correlation properties of linear processes, which, as we 
saw in $ 6:3, provide continuous spectra, were extended to vector 
processes at the end of $5-2, so that we need merely note the 
associated spectral formulae. For the linear process 


X()- | ° Gü-w)dY() (Gt)-0foruct), — () 


where Y(u) is additive, we obtain for the spectral matrix W(w) 
the result 


Wo) - 5- H(o) VH'(o), (7) 


as the Fourier inverse of the formula 
V(r) =Í Gw) VG'(v — 7) dv, (8) 


(it is convenient in the present section, involving complex 
quantities, to define V(r) equivalent to V( — 7) of equation (43), 
§ 5-2). In (7), 


H(w)= F eitu G(v) dv. 


The relations (7) and (8) also hold of course if Y(u) is orthogonal 
but not additive, and further if G(u) + 0 for a « 0. 


For the multivariate autoregressive equation of § 5-2, 
dX(t) + RX(t) dt — aY(t), 
we obtain H(w) = (iwI + R)-, so that 


1 
W(o) — 5 (R + iol) V(R' — ivl), (9) 


To illustrate this rather condensed formula, consider the 
bivariate case (R real) 


R= s <) 
Xa — Os, 
Then (R -- io I)1 i 


(e se. X g 
93 a22 + iw, A 


where A =d] Xo — 05003 HiO 


a9 +411) — ?, 
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and 
Ooo + ii - ‘ 
22 +10 Cy | ( o? p= jt Oy 
Oy, tiw \PT 7% ei —üg 0-19 


Ww) = ( —U3 


21((053 Hoo — 012921 — w?) + o*(2 + Gg)*] 


( n i 

=> Ma Po Ligen (10) 
where 27{ (41 Loe — 9139 — oj E ua + %22) j 

| Bu = 0g + 0?) — 297 050312952 + 0303s, 

4 Bis BS, = — 030 a9 + 71 Ga( Oy %o1 + X11222 + w?) — 052192 
| — ie [019 + p0, Toldo- 621) — C802) 


Bog = 010 — 200102019 + os(o +0"). 
Although @ multidimensional 


process or ‘field’ X(u), say, must not be confused with a veotor 
oec X(t), the method of studying the properties of a uni- 
nte process X(t) by means of the properties of the vector 
bn d X(t) = (X (h), X lto) X (t) the set ty, «In 
Drac ar itrary, may evidently be extended to multidimensional 
XQ sses, and implies a relation between the properties of, say, 
X ,v) and X(u, v;), the set v; being arbitrary, 88 well as between 

(u,v) and the matrix or tensor variable X (upt). 

The correlational properties of stationary processes developed 


: 586-1 and 6-11 extend to multidimensional processes similarly 
© the extension of characteristic functions to more than one 
wo-dimensional sta- 


Vari 

a For example, in the case ofat 

tim, ary process (if the two dimensions are spatial, this is some- 
€s called a spatially homogeneous process), we write 


(7, v) = E(X*(5 9) X(t3- T, uv) 


Multidimensional processes. 


op 
W! 
here E(X(,u)-0, B Gw X(,u)- 0°, 
a 
Nd then have the relations 
plr, »)= Í oitrotr dF (w, 4), (11) 
(12) 


X= [ema azio.m 
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where E(Z(w, u) Z*(w, u)) = 0? F(o, 2), 

and the integration is over the whole two-dimensional domain. 
These are the analogues of formulae (4) of § 6-1 and (1) of § 6-2, 
and Z(w, 4) is doubly orthogonal in w and x, i.e. there is no con- 
tribution to E, |dZ(o, 1) | dZ*(u, v) except along w=u, “=v. 


In the statistical theory of turbulence the velocity at a fixed 
point r=(zx,y,z) is a random vector quantity U=(U,, Up, U3), 
depending on the space coordinates r and on ¢; it is thus both 
multivariate and multidimensional, and is an example of a 
random field vector changing with time. The further general- 
ization of (11) and (12) to cover vector processes is evident. 


6-51 Isotropyand other special conditions. Anyassump- 
tions of symmetry in the case of field quantities impose special 
restrictions on the correlation and spectral functions. To take 
a simple case, a stationary process depending on two space co- 
ordinates z and y will, if circular symmetry is assumed, have a 
correlation coefficient depending only on r, the interval 


Aa)? + (Ay)) 
between two points. Thus formula (11) of § 6-5 becomes 
ptr) = [ etastnan dF (wv, p). (1) 


When the derivative f(»,u)-9*F|0wOjp exists, the inverse 
formula becomes 


1 «e [9m " " 
f(o, 2) = wal. . e-twAz-inAy pp) d AcdAy, 


or, changing to polar coordinates, we have 
1 o fa 
f(o, 4) = coal f e irte cos -Fp tn 6) p(y) rdrd0 


7g; [, Mrnptrar, e 


where 5—4/(w*--u?), whence it follows that f(w, H)-g(y), 9 
function of 7 only. Substituting in (1), we obtain the formula 
inverse to (2), 


ptr) - 2n | Arma wan. (3) 
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E. rnc ed has been used as a possible correlation 
point. It iib s Ree it dos pem send rep 
Sdn eskeo it does not appear possible to set 
ca cleat pic analogue of the one-dimensional linear Markov 
pr eading to the correlation function e^ (t> 0), in one time 
sik sion, though two- and three-dimensional Markov processes 
ind oe defined coordinate axes (corresponding, for 
ext PA to successive rows Or layers of a crystal structure) do 
e vie is leads to some difficulty in constructing realizations 
in epp processes. In the case of isotropic point processes 
meas = three dimensions, arising; for example, in ecological 
i 80 plant communities or 1n models of stellar populations, 

e isotropic model which has been used is à purely random 
pum of ‘nuclei’, surrounded by ‘satellites’ with an 
0 rOpIG distribution of distances from their respective nuclei. 

bviously, however, this is a model only appropriate in suitable 


contexts. 
Turni 
the rning now to vector processes, We shal 
oe matrix for a vector process i 
range Wy, Tos Lye This has its most familiar application in 
ropie turbulence models, where the vector variable refers 


2 velocity ata given point; we shall consequently denote 
E , Variable by U(x; t) = Uni To 73:9) (i21,2, 3). The co- 
riance matrix is assumed ‘stationary’ with respect to X; its 

and need not 


solani on the times ¢,¢+7 is left arbitrary, 
€ indicated. Spherical spatial symmetry then gives certain 
trix, denoted by V(&) for two 


oo on the covariance ma 

Ari separated by the (column) vector g= ($). If m= (mi), 

cae (n;) are unit (column) vectors in arbitrary directions, the 
variance between the velocities in these directions at the two 

Points is m'V(£) n. Now the assumption of isotropy implies 


t z 
mn this covariance depends only on the mutual configuration 
E, m and n, and not on their orientations with respect to the 
f. Robertson, 1940) that the 


Cartesi 
bili tesian axes chosen. It follows (c: 
near form m'V(E) n must be of the general invariant form 


V(E)n- A(£) m'££/n BÉ) mm * c(|&m,n, (4) 


w 
[m | E; m, n | denotes the determinant formed by the com- 
nents of the three vectors, and g= (6'5). The assumption of 


1 derive the form of 
n a space of three 
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isotropy also assumes, however, that reflection of the configura- 
tion does not affect the value of m'V(E) n (i.e. the invariance of 
this scalar holds for the full rotation group, including improper 
as well as proper rotations). This excludes the last term in (4), 
as this changes sign on reflection, so that 


m'V(5) n= A(é)m’EE'n + B(£) m'n, (5) 
and V(E)=A(E) EE’ + BE) I. (6) 


Equation (6) gives the general form of the covariance matrix 
(or correlation matrix, as isotropy also implies that the variance 
of m'U is independent of m; in fact from (6), V(0) 2 B(0) I). 

A further relation is obtained if the velocities refer to an 
incompressible fluid, owing to the equation of continuity. The 
isotropic covariance matrix (6) (which is a second-order tensor) 
is then called solenoidal. With this further condition, which 


implies 
9U,(x) 9U,(x)  8U,(x 
m mt a T0, a 
; ay i 
vochten (S) ve)=(2)' vo, 


where (0/0x) is the column vector (0/@x,), 
stituting for V(E) from (6), and noting that 


ALE) _ DE DALE) _ E; BALE) 
9 Ob ob E ap' 


we obtain EE + £0A(£) TA : | E'—0, 


and £-x— y. Sub- 


3 é a 
wiisnes 44(5) 4594) | 1 OBE) _ 
TB Wo, (8) 
For the isotropic forms of other moment 
be made to Robertson (1940). 


In turbulence theory it is usual to define equivalent scalar 
functions f(E), g(£), in terms of which (6) is written 


tensors reference should 


ve -4 p SE egl. (9) 


6:51 ISOTROPY AND OTHER SPECIAL CONDITIONS 195 


This implies g=B, As(f-g)IE, so that (8) becomes after 
reduction " 
E of (10) 


We shall also denote the trace f + 2g of the matrix V in (9) by Rig). 
. Consider now the corresponding forms for the spectral func- 
tions. We have in general 


vi [eoa an 


ectral function densities f(w) =dF(w)/dw 


assuming that the sp 
trix or tensor spectral density 


exist, we note further that the ma 


foj- L [eee Eu dis did) 


(27)? 


We then easily obtain, if we replace Vié) 


(12) 


by the expression in (6), 
fo) - «(o ex! +u) (o Vo (13) 
dimensional) Fourier trans- 


where, if a(w), f(w) are the (three- 
ation (11), 


forms of A(£), B(é) corresponding to equ 
(14) 

nob- ds 
The further condition that V(6) is solenoidal as well as isotropic 
ane in the w-space that f(w) in (13) also satisfies the relation 
=0, whence (a) w? + (a) = (15) 


Reiss 
elation (13) then becomes 


f(w) = (“) (1-85): (16) 


We define new functions in (13) analogously to (9) and write 


(a) e 5 ow 0I a7) 
With b= 7 and w?x =a—6, whence (15) becomes 
(18) 


$ a(w) =0. 
hus the function a(w) vanishes for incompressible fluids. 
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Returning to the relations (14), and making use of the reduced 
formula for the three-dimensional transform of a scalar function 
Hie ANE), wis 4m [(*?sin£o 

-——s] ~ EAE) dE, 19 
ao) =a [> ee” atat Q9) 


we obtain the formulae 


ato) m gs [^ [ ftus to a-o (8 oos £o) at. 


2 
b(w) = gx], E sin £o 4- (f—g) e — cos £v) | d£. 


Notice that F(w), the trace a+ 2b of f(w), is related to R(£) by 


0) 


Fi) - y, [^ 555^ Rw a. (21) 


Longitudinal and transverse components. If we consider the 
Fourier relation 


UG)=[eF#az(w) (22) 
for the random vector quantity U(E) & UE) (i=1, 2,8) itself, 
we may resolve dZ(w) into components along and perpendicular 
TURCA Bine (w/w) dZ, (c) - T (o), (23) 


where, if a and B are orthogonal unit vectors with w/w, such that 


the corresponding components dZ,(w), dZ p(%) are uncorrelated, 


aT (w) = adZ,(w) BdZ (o). (24) 
The isotropy condition gives 
H{dT (co) dT'(w)} = (aw + BB’) (4, (c0) dZ¥(e0)}, (25) 
and making use of the matrix relation 
ae’ + BB+ wo’ /w?=I, 
we note that the isotropy condition (25) implies further that 
EX{dZ(w) dZ'(w)} =f(w) dw 
7 (wo'/v*) (EZ (c5) dZx(w)} — E{dZ,,(w) aZ%(w)}) 
+ LE(dZ (c) dZ*(w)}, 
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with dZ,(w) and dT i i i 
the ee (w) necessarily uncorrelated. This provides 


E{dZ,,(w) dZ&(c2)) = a) dw, 
E{dZ,(w) dZ3(w)} = i(e)- P 
aC) 7 (wo)} = E(dZ, (eo) d) = blo) deo. 


U(E) is the velocity vector, we 


In turbulence theory where 
derived quantities such as the 


somet : 
: metimes require to consider 
orticity vector v, where 


Correspondingly, if dW(w) is the corresponding random Fourier 


component, we have 
dW(w) =iw x dZ(o), 

duct. By definition dW(w) has no 
her or not the fluid is incom- 
so the simple relation 


à here X denotes a vector pro 
ongitudinal component whet 
pressible. If we have isotropy, We have al 

E{dW(w) dW'(o)) = 0°) (1-) dw. (27) 


198 


Chapter 7 
PREDICTION AND COMMUNICATION THEORY 


7-1 Linear prediction for stationary processes 


In Chapter 1 we saw that some stochastic processes may be 
termed deterministic, and that their states may be predicted 
exactly at any future time from suitable information about the 
past. When a process is non-deterministic, the problem still 
arises as to how to predict its future state with, in some sense, 
a minimum of error. This is of particular relevance when the past 
history of the process can be examined in detail, and used as a 
basis for the prediction. The solution has been developed by 
Wiener in the important practical case of linear ‘least-squares’ 
prediction for Stationary processes, and extended to cover 
associated problems such as the design of linear wave filters to 
eliminate ‘noise’. An introduction to this technique is given in 
this section; for a more complete treatment reference should be 
made to Wiener (1949), 


Let us first examine the prediction problem for the linear 
stationary process 


t 
XO- f. st-w)aYQ), (1) 


where natural damping is counterbalanced by continual further 
independent disturbances. We at once have 


ir 
xeen-[ g(L-- 7 —u)d Y (u) 


ter t 
-f gt+r-u)dF (u+ | g(t+r—u)dY(u), 
where the first term on the right represents a contribution tO 


X(t-+7) from disturbances arising after time t. We therefore have 
an essential error of prediction with variance 


rf ole —2)9"(r 2) de =r, "| g(a) |*da, (2) 
0 


whine €: Í . dE(Y (u) Y*(u)). 
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(Notice that even for the non-stationary linear process 
t 
X(t) = f'e- u)dY (u), 


i , 

Where f a glu) g* (wu) du does not necessarily converge, the above 

argument would still be applicable.) 

«s generally, for all stationary processes 

d Eae to the second-order moments, i.e. Y 

ie essarily additive, we should naturally expect t 
above, namely, 


xen" [ g(t-- v -u)d Y (u), 


Ip the best linear prediction formula, and its error 
nor e would still be given by (2). In the particular case of 
iine stationary processes it is also evident that the best linear 
In oe and best prediction will coincide. 

(w<t) ^» solution (3) we still require of course to express Y(wU) 
(1) es Y terms of X(v) e « t), and this must be done by solving 
while Ple terms of X(u).T It may, incidentally, be noted that 
at once tl solution so far developed isin general formal, it provides 

he complete and rigorous solution for the important class 


of li 

eee processes which are themselves the solutions of stoch- 

ic equations representing the dynamical behaviour of physical 
ometer, under random in- 


ai such as the torsional galvan 
oe impulses. The replacement of dY (u) (or any con- 
the Hen linear expression in Y (u)) by alinear expression in X(u)is 
by [air For example, for the Markov process generated 
rm e equation dX(t) - uX() dt - dY (0 for which g(u)- €" 
i0), we obtain 
X,(t+7) ser f 


cula depending, as of course it shoul 

Tor last available value X(t). 
the li obtain the general solution 
near autoregressive formula 


X,(t+7)= I “xt _u)dB(u), 


xi ar this to be possible, g(u) must satisfy the condition 
o, un its Fourier transform, regarded as & function of t 
no zeros in the lower half-plane. 


with a representa- 
(u) orthogonal but 
he second 


(3) 


t -ut-wd ¥(u) - e X(0), 
d for a Markov process, only 


-o 


(equivalent to (3)) we consider 


(4) 


referred to on p. 201, 
he complex variable 
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and try to determine B(u) so that 


E(| X(t4-7) - X (t+) |)) 


is à minimum. This problem is quite analogous to the classical 
least-squares estimation problem for a finite number of regression 
coefficients (the corresponding problem for stationary sequences 
is of course even closer to the classical regression problem; see 
equation (15) of the next section, $7-11), and we should expect 
to have as the equation for dB(u) 


e&[ pe-u) dB(u)=o%Xp(rt+v) (v»0) (5) 


That this solution does in fact minimize the least-squares pre- 


diction error may be checked by considering any prediction 
formula 


Xy(t+7)= |" Xt-0)aB'). 
Then 1 


Ei X(t+7)— X (t+7) |} 
- E(| X(t+7)-X,(t+7) [2 4- E(| X,(t+7)-X}(t+7) |}? 


+real part 2B {| x47) I : X*t-2)dB()| f e X(—v) 


xd[B(v)— Bon} . 


We assume that p(w) is continuous and that the integration and 
expectation symbol in the last term may be commuted. Then 
the last term becomes the real part of 


2e NEEDS É p*w-u)dB*tu)] dUB9)- B9). 


and vanishes in view of (5). Hence X7 can never give a smaller 
prediction error than X,, as the mean-square error for Xp 
consists of the error for X, plus a term which is essentially 
non-negative. 

In order to solve the integral equation (5), we shall assume that 
the linear representation (1) is possible in the wide sense (i.e. for 
Y(u) orthogonal); that is, we assume (see equation (18), $6:3) 
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that X(t) has an absolutely continuous spectrum f(v) for which 
(except for cases where perfect linear prediction is possible) 
* |log f(w)|do 
T. lroi «oo. (6) 


in Chapter 6, with no zeros or sin- 


We may then define (w) as 
(w) e*(w) = eX f (o), 


gularities in the lower half-plane and so that æ 
and write correspondingly 


o% p(u)= j , Blv+u) B*(v)dv | (v) 0 for v< 0), 
Where fu) — g(u) /K was formally related to a(w) by the Fourier 
relation j S 
pu)= Jen Pema do. 


Now the mean-square error of prediction may be written 


e - [7 or +uaB* tu- i 7 prr ud Bo) 


+f Joe maso ap) l 
=o% [7 fee bonae (7) 


where b(w) is defined formally by 


b(w)= f 4 giv B(v) (dB(»)- 0 for v< 0). (8) 


from (7) be regarded as 


The ioti 
Problem of prediction may thus 
p (o) ir^ by ox Af(o)b(0). 


approximating ‘in the mean’ to ex Afi 
Replace ox /f() by oo) and write 
ei?a(o) - c0) + eg(), 
o-called if it is without 
infinity) below the 
d’. Then (7) 


Theo c, (c) is a ‘backward’ operator, 8 
Ingularities (and of appropriate order at 
Teal axis, and c,(w) is correspondingly ‘forwar' 


becomes 


aff AE Mato) hays (9) 


SS e Eee e LLL LLLA ssnttttttttiiti 
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this result following (as b(w) «(w) is backward) from the general 
formula a) 
f tor rttaydo=0 
-0 


for any two operators y,(w) and y,(w) of backward and forward 
type respectively for which the integral exists. Note also that if 


ywo) =¥1(%) + yw), 
and Misc 


2] - 

o 0 . 

then nof ett T(t) dt, veo) =[ ett T(t) dt. 
0 -0 


As only b(w)aæ(w) is at our choice, the expression in (7) is a 
minimum when 


eit» (wv) do, 


b(w) a(w) — es), (10) 
which must therefore represent the solution of (5). From the 
above relations for y,(«), Y2(), we may readily see further that 


E li 2 i 
¢,(w) = OF Í etu givtn9 () dw. (11) 
-0 


Moreover, the mean-square prediction error is now from (9) 
i) 
ok ] leo) |*do, (12) 
-0 


which may be shown without much difficulty to be equivalent 
to (2). 

There are obvious advantages in deferring any attempt to 
establish conditions to ensure the validity of the above mainly 
formal arguments. Suppose «(w) is of the general ratio of poly- 


nomials type referred to in equation (19) of §6-3, and expressed 
in partial fractions in the form 


alo) = X, a, (o — wp); (13) 
then an explicit formal solution for b(w) may be shown to be 
q-1 i7)a-1-r 
aw) b(w)=E, a, ete (ir) 


SUR A 1-7 (sey 


Now if we consider the particular spectrum 


(14) 


Faje 1 2xf 


2 (Boat) x ofa de 
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we have at onc 
e, b akin: 
second-order — g use of the relation of (15) to the 


-— AX(t) - aX()dt-- BX()dt— 2Y (0, (16) 
t H H 
e method indicated at the beginning of this section, 


X,(t+7) -f gt+r-u)dY (u), 


whi 
ere g(u) = (e — eds¥)/(Ay— Aa)» 
We thus find "ta -Ajenga iA 


x [A eur — Ae] Ayr — eh 
Jen el xo- epe. an 
2 ES 


i à 

rie the differential operator d|dt. The dependence of 

E PX ee the two quantities X(t), X(t), but not on other values 

eaten 2 2 t, would of course be expected from the Markov 

X, Xo. ds e process (16) when regarded as 2 joint process in 

Saris z ut the formal Fourier inverse of djdt is iw, and it 
verified that (14) gives the solution 


b(w) — &r) iol) 


wh 
ere E(r), ņ(r) are the corresponding expre 


bray : 
darsi in (17). Thus valid solutions b(w) are not restricted to 
ed functions, and correspondingly, B(t) need not be of 
f differential operator expan- 


limi 
to total variation. This use 9 
s is considered further in the next section. 


ssions in square 


ogonal expansions. 
5 that a complete 
rms of its m.s. 


roblems. Orth 
ted in Chapter 
ossible in te 


T. 
Tep 11 Further associated p 
T 7) is analytic, it has been no 
ylor expansion for X(t+ T) is p 


diff " @) 
erential coefficients, X (t), so that 
rrin! - X (t7), (1) 


and & ; i 
the prediction error iszer §7-Lis associated 
si-analytic’ functions (see 


wit : 

Pin a slightly wider class of ‘quash 

thee, and Wiener, 1934, Chapter 1), though in practical cases 
wo conditions are usually satisfied, OT not satisfie 
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If the m.s. differential coefficients Xo only exist up to a 
certain order, and it is required to express X(t+7) as nearly as 
possible in terms of these, the appropriate coefficients will, 
however, no longer be 1, 7, 477, ete., but those given by the least- 
squares solution. The latter is conveniently obtained in terms 


n) 
of an orthogonal set of linear combinations of Yoo. It will be 
recalled that for a real stationary process 
d " 
qg P X)-2E(XX)-o, LHX} sz) - o, 
d e 
bat q EX) = {XX} + E(X2) — 0, 


so that X(t) is orthogonal, not to X(t), but to 
X Xo*(X)lo*(x). 
Tn general, let the sth orthogonal function be X,, and denote 
(s) 
o*(X) by o$. Evidently 
(s) (8+ 2r4+1) (s+r) (8--r--1) 
(X X jJe.-(-1Y*E(X X }=0 (620, r>0), 


and odd- and even-order derivatives form two Separate groups 
of orthogonal functions, and to every orthogonal function Xs; 


there is another one X»,47 Xs. To obtain the recurrence 
relation for X,, let 


(28) 
Then  E(X,4X, ,)- E(X Xosa} +a, (X8, ,) = 0, 
(28) 
whence a, s= —E(X Xs, ,)E(X: [S (2) 
The least-squares coefficients in the orthogonal expansion of 
X,(t+7) are then given by E{X(t+7) X,(t)}/E{X2(t)} and are 
expressible in terms of p(r). For example, 
E(X (t +7) X()|E(X*(0)) =p(r), 
E{X (t-+7) X(0)] E(X*()) = 


-P'o (emo). 


In the example p(r)=e-#!7!, X(t) is not differentiable and pre- 
diction by the above series terminates at the first term; in the 
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exam x 

> we s * of $7-1itstopsat thesecond. In these two cases the 
er le rs provides the most efficient formula, but this is 
vin "iii true; for instance, if an extra additive component 
bà disent pn m (t), the resulting process would no longer 

e,an fe 
a, e ormula of the above type could proceed 
Th i 3 V 
We vr aur of expansions in terms of X(t) to the general 
Taylor ve formula may be seen by noting that the complete 
xpansion for X (t) corresponds to writing 


b(w)=1 ior 4 Air) t+.» 
and à : 
that if X(t) is only differentiable up to order s, the formula 


for b(w) may be written 


b(w) = 14 wr 4- F(iwr)? +--+ (oT)! 


i sexi], edu[ aw) ei | eir —1— iwT 
mo (w)J o -o aš 
(iwr)® 
- Gm] dw. (3) 


p(t) X(t), we 


But 
for p(r) — e-^!! and the prediction X,(t+7)= 
t zero if we 


ma E 
Dono from (3) that the remainder term is nO 
(0) =1+..., though it is for b(w) e p() + 
jn the above 


Ns expansion of b(w) in powers of w is replaced c 
of o gonal expansion by an expansion of b(w) in polynomials 
fas S eps cate with respect to integration over f(o) dv; this is 
and y seen from the correspondence between the derivatives 
powers of w. Such orthogonal polynomials can of course only 

tral density function 


be 
bs efined when the moments of the speo 
exist, and this corresponds to the existence of the derivatives 
g orthogonal functions 


of 
in E y ). A more general method of usin 
as been suggested by Wiener and illustrated by an expan- 


Sion j É 
in terms of Laguerre functions. Let us write 


N 
b(o)- X a,b, (0)s (4) 
n=0 
b, (o). The admissible 
of functions vanishing 
larities below the 


and " , 

Paro armate to alw) "^ by alo) Za, 

or sina ig are again to be transforms 
egative argument and to have no singu 
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real axis; this condition applies also to b,,(w)a(w), which are 

chosen to form an orthogonal set. We may take, for example, 
(1—io)* 

T (1 +iw yeu 

Illustrating this method on the familiar case p(T) — e-4!7!, or 


b,(w) a(w) - 1 


(8) 


ph Oo AI 
To= aah) ate “Ir Te 


we choose for convenience a time-scale for which #=1. Then 


m vq wae ioe (n=0), 
[ = D 


m (w 0 (n> 0), 
M" 
and aio) Ta Tu) (oge) vy ’ 
whence X,(t+7)=e7X(t) (u—1). 


T'he problem of ‘noise’. The problem of a superposed and un- 
wanted component (noise component), important in electrical 
applications, may be treated by similar methods. A typical 


problem is to estimate Y(t+7) for positive or negative (or zero) 
T from the observed process 


X()- Y()4-Z(), 


where Z(t) is the noise component. Thus for the case where Z(t) 
is uncorrelated with Y(t), Wiener has shown by similar methods 


that the previous formula (11) of §7-1 for aw) b(w) should be 
modified to 


a(o)b(u) — | y etu [^ ciego) du, (6) 


where 93(0) — 0% f y (w)/x*(w), (7) 


The error variance in the prediction 
Y,(t+7)= | " X(t—u)dB(u) 
0 
isgiven by — [Ite du [" | Au) eau, (8) 


where //,(u) is the transform of 04, 


fiu)- = fe ""a(v) do. 
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In agreement with the well-known efficiency of linear combina- 
tions of least-squares estimates, it may also be shown that to 
approximate to Y (/-- 7) we should write 


¥,(t+7)= Í : X(t-u)dClu), 


where a(0)c(0) — 5 Í GN d peveenato) du]. (9) 
27 Jo -o 


The error variance is 


IMP qo 
0 T 


where £ (u) is the transform of iwa,(w), ete. 

Prediction for stationary sequences. The prediction procedure 
for sequences is naturally rather similar, and we merely sum- 
marize below the corresponding formulae. Formula (1) of $71 


becomes * 
X, x gls ~u) Yur (11) 
u-—900 
and the most efficient prediction formula is 
[X] X get7- Fw a2) 
u=- 0 
with error EN T (13) 
u=s+1 
t-1 
and error variance oF z| g(v) |*- (14) 
E 
Equation (5) of §7-1 is replaced by 
S 15 
x Ps—w%w = Pst? ( ) 
w=0 
With solution for a,, given by 
BE f 7 gu) etmutrde, (16) 
a(w) bv) = 5 Pu ES 
Where b(w) = 5 eia, (uF 0 for u « 0). 


In this solution the function a(w), which satisfies the relation 


9 (o) (my) = o3, f (o), must be such that 
n 
gia (o) dw 


ii 
f.-97r glu) ES "E 
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is zero for u<0. The general formula for a(w) has been given 
by Wiener as 


a(w)= Ja E p.e = Oxf (w) etiga), (17) 
where id(w) = X4. gius — P A otis (18) 


in (18) the coefficients A, are the Fourier coefficients of log f(w); 
that is, 


log flw)= Y, Aye-ive, (19) 


The condition parallel to equation (6) of §7-1 is 


jj | log f(w) | dw < co; (20) 


otherwise perfect linear prediction is possible. 
General autoregressive representations of the sequence type 


[X,,], 7 Xa, X, , 
u=0 


were first considered by Wold (1938); explicit formulae for the 
coefficients a, and fl, were obtained, independently of Wiener’s 
work, by Kolmogorov (1941), who also considered the inter- 
polation problem in the discrete case, 

It is also possible to extend all these methods to predicting 
from multiple time-series, but such extensions are omitted here- 
In statistical applications at least it seems probable that any 
cases which arise are best handled directly as autoregressive 
problems in a finite number of variables (cf. Chapter 9). 


7'2 Theory of communication 


The theory of prediction, and indeed the theory of stationary 
processes in general, has a bearing on a topic which has aroused 
considerable interest in recent years, the general theory of com- 
munication. This theory has been systematized by Shannon 
(1948), and we include here some of his main theorems, treating 
these in an elementary way in keeping with the physical notions 
used. We depict any communication system schematically a8 
in the diagram (fig. 8); we shall, however, consider first discrete 
noiseless systems, in which a sequence of choices from a finite 
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em . 
to another ET symbols is to be transmitted from one point 
Or example, as E annel with noise interference assumed absent. 
Of a dot, a dash m s Morse code, the symbols may consist, 82Y; 
, a letter space and a word space. These different 


symbols wi 
will i 
ill in general have different durations. There may 


e in 
additio 
n ; Ee: ENT 
certain restrictions On permissible sequences; 
us. A suitable 


for ex 
am 
du) two spaces will never be contiguo 
the capacity C of the channel is 
C — limlogn(T)/T, 


To 


Q) 


sible sequences of 


Where 
n 

(T) denotes the number of permis 
entionally used. 


durati 
ion T 
. The base 2 for the logarithm is conv 


. Message 
Receiver destination 


T 3 Channel 
ransmitter 


Fig. g, a 
re a diagram of communication system (cf. Shannon (1948). 

Onsi ; 
assumed r next the discrete source of information. This is 
ninisti consist of a stationary stochastic process (non- 
Words, a and ergodic) in which symbols (for example, letters 
thi ence whi not necessarily the same symbols as in the signal 
E Possib] ch represents à coding of the message) represent 
hen er of e states. Any sequence 5 of finite duration T (or 
e symbols N) will have @ probability p(S) and we define 
by 


Aver, 
age j : 
ge information or entropy measure] for 8 
The ent J= — E(log p(5)}- p 

TODI 

pies per symbol or time unit are then defined as 
E =limJ|N, H-hn JIT 3) 

mus This tr Nc qoru 

in} i i ibu- 
Chapa à jn ubt. measure is defined by tho completely specified ent 

er 8) ja 15 different from information a used in statistios (28 
f inference. 


inci i 
onnection with problems © 
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(that these limits exist follows from the property 


J(U) «J(U, V) «J(U) - J(V)). 


The justification of these definitions lies in their relations and 
use. Thus the fundamental theorem connecting H' and C is that 
it is possible to encode the information source so as to transmit 
as near the average rate C/H’ as we please, but it is not possible 
to transmit at a greater rate. 

To see this, we note first that the entropy measure in (2) for 
a sequence of duration T is a maximum when all the permissible 
sequences S have equal probability, and its value then becomes 
logn(T), where n(7’) is the number of different possible sequences. 
Thus the maximum entropy rate for the symbol sequence U, 
say, of the channel is just its capacity C. Now the entropy of 
a discrete sequence is not altered by reversible encoding since 
P(S)=p(U); and, since (while there may be finite delays) the 
transmitter will be assumed to cope with the source without 
allowing any indefinitely increasing time-lag, the entropy rates 
H for source and channel must also be identical. Hence H for 
the source cannot be greater than C. 

One method of showing that it is possible to approach the 
maximum rate is to find an explicit code that will achieve this- 
Consider the possible sequences of N symbols from the source» 
and arrange the probabilities P(S) in order of decreasing Pf 
bability 

212732 32 ... 2 p. 
1-1 
Let B-X Dj. 
j-i 
The ith message is encoded b 


n y expanding P, as a binary fractio? 
and using only the first t; pla 


ces, where 
log; 1|p; <t; « 1 log, 1|p;, e 
or 1/2: < p, < 1/261, 


This ensures a reversible code. Thus P,,,, differs by p; from F; and 
its binary expansion will therefore differ in one or more of th® 


first t, places, and similarly for all others. The average length 
the encoded message will be 


E(t) - Zp,t;, 
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where from (4) 
— Ep; log p; < Ep:t; &Epí(1 — log yi) 

ls Jy < E(t) <1+dy. 
T 
Er average number of binary digits per message symbol is 
üt )IN, so as N co we obtain H' as the limit of this average. 
in the permissible signal sequences are not binary digits, we may 
x a correspondence between the one set and the other.) It 
Sè be noticed that this code employs the intuitive choice of the 
iL pr for the most probable sequences of message 
Pl oisy channels. When the rece 
v e transmitted signal sequence 
á MR must be distinguished. Let 
yh V be p(U, V) and the conditiona 

e p (U). We define the conditional e 

» by the relation 
J(U, V)= — E(log (U, Vy 
= — Bflog p(V)}— Er(Ev w log py (U)) 


ived signal is distorted by noise, 
U and received signal sequence 
the joint probability of U 
] probability of U for given 
ntropy Jy( U) of U, given 


=J(V) +50); 
ia igmean as the lengths of the sequence increase 
nitel 4 
icd H(U, V)- H(V) + B (0). (5) 


me information lost through the noise effect or the equivocation 
: defined as this conditional entropy H,(U) of the transmitted 
PS tc when the received signals are known. The actual rate of 
Tansmission (per unit time) is thus defined as 
R=H(U)-Hy(U); 


moe capacity C of the noisy channel is define 
varies. : 
he o establish the relevance of this definition for noisy channels, 
arbit In we may still transmit at a rate approaching C with 
on fi rarily small equivocation, We shall first digress for a moment 
D ormula (2). For long enough sequences the ergodic eae 
uma d to imply that *nearly all such sequences are typical $ 
do not need further averaging, that is, 


—logp(S)T 74 


(6) 


d as max R 


(7) 
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in probability (for explicit conditions ensuring this, see §§ 2-21 
and 8-2). For such a sequence this implies that its probability 
2 (S) tends to be independent of S and given by 2-7, that is, 
we may replace the actual ensemble of sequences by a set of 27H 
equally probable sequences, ignoring the remainder (the 
rigorization of this argument will be omitted). 

Hence for a message source with an information rate Ry, we 
have in a long time T a set of 27/5 equiprobable relevant 
Sequences. We may associate these with the 2THW) signal 
Sequences as we wish. Consider first their random association; 
then the probability of a particular signal sequence being a 
possible message will be 27US-H(U) Each received signal 
Sequence may have come from 27H yp(U) transmitted signal 
Sequences, and the probability that none of these signal sequences 
is a possible message (apart from the actual message) is P, where 


log P= 2TH AV) Jog {1 — 2TIR-H( Un. 
Now if Ry « H(U)— H,(U), we may write 
Ry-H(U)= — H,(U) —9, 


where 7>0. Hence log P 2-7». 0 ag T+, or P+1, The 
above result shows that the average probability of error -> 0 for 
the combination of message and signal sequences in all possible 
ways. It follows that there exists at least one combination for 
which the probability of error — 0. 

For Ry> R, it is only possible to transmit with a finite equi- 
vocation Ry — R. This follows since we may transmit the fraction 
RIR, of the message accurately, and the remainder gives an 
equivocation Ry— R. 

The elimination of the noise effect has of course to be achieved 
by a redundancy in the coded or transmitted signal sequence 
compared with the original message sequence. This may have 
to be achieved in practice in a rather complicated way, but 
redundancy in the original message may reduce the need for it. 
We define redundancy in a message sequence by 1— H/Hpax. 
For the English language, it appears to be well over 50 9/. 

Use of continuous signal processes. With continuously varying 
signals the information content still remains effectively finite, 
owing to the presence of noise and to the finite resolving power of 
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any apparatus. Using for convenience à Fourier representation 
(corresponding closely to the common use of electromagnetic 
signals), we assume that the channel is restricted to the frequency 
band 0-W cycles per second. Now if any realized function z(t) is 
so limited it is completely determined by its values at intervals 
1/(2W). In fact we may write 
©  sinz(2Wt—n) 
a(t) s Ca2Wi-n) =e n) (8) 


(cf. J. M. Whittaker, 1935). The restriction to the same frequency 
range of spectral frequencies of a stationary process only 
defined at values 1/(2W) apart is another aspect of this result. 
Any section S of this function of duration T will thus be repre- 
sented by 2T W coordinates, and will correspond to a point in à 
space of 27'W dimensions. With this representation, it should 


be noticed that 1f7 1 
i| gundt- Xa, 
1|, x(t) dt 2TW Tni 


when 2(t) =0 outside the interval 0-7, since 


co sin z(2Wt—m)sin z(2Wt —9) 7 _ 5 
wf", z2Wt-m)  7(2Wt—n) iun 


where Onn =1 (mon), 0 (m+n). The average power P of the 
Signal is defined by the above mean square; thus the square of 
the distance of the signal point from the origin is 2W7'P. 


The particular process defined by 
o — sinz(2Wt—n) (9) 
X()-x4 a-qt-n) , 

where the 4, are independent normal coefficients, all with the 
same variance N, will be called ‘(pand-limited) white noise’ of 
average power N. x 

The information (entropy) measure J of a set of continuous 
variables is defined by — {log f}, where f is the probability 
density function. In contrast with the measure for discrete 
variables, it is not defined absolutely, and will change with a 
change of coordinates affecting the density function. Entropy 
rates per second or per degree of freedom (one for each coordinate) 
are also readily defined. 


214 PREDICTION AND COMMUNICATION THEORY T2 


Entropy loss in linear filters. The change in entropy with a 
change of coordinates from x to y is easily seen to be 


AH’ = — E(log K}, (10) 
where K is the Jacobian (ex/@y) of the transformation. For the 
linear transformation y = Ax, this gives 

AH’ = 3Mog|A |2. (11) 


A linear filter merely changes each frequency w by a factor y(w), 
and as each frequency has two degrees of freedom (sine and 
cosine) it is readily shown (e.g. by taking a discrete set of fre- 
quencies at intervals 27/T and proceeding to the limit as T — oo) 
that 


, l pew " 3 
AH "xl, log | y(w) |? dw. (12) 


Optimum entropy properties of white noise. It should be noted 
that the distribution in one variable with maximum entropy for 
a given standard deviation is not the uniform or rectangular 
distribution, but the normal or Gaussian one. For if 


J= - [rario dz 
is maximized subject to 


Jio dx=1, IE da — 0, fere dr=0?, 
we have, by the calculus of variations, to maximize 


free [Az? + wot y — log f(a)| da, 


whence —1—log f(x) +Az2 gx v0, 
1 
or F(x) “Janene. 


For this distribution it is easily shown that J =log J(27ea?). 
Similarly in n dimensions, for given second-order moments, the 
n-dimensional normal distribution is the optimum. Further, 
given average c?, the entropy is obviously a maximum for 
independent variables and when all the o’s are equal. 

Long realizations of an ergodic process will again be typical 
of the stochastic process, and we assume 


—logf(S)/n—> H' (13) 
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w probability as n, the number of coordinates, tends to infinity. 
e have also, since there are 2W coordinates per second, the 


identity 
2WH=H'. (14) 


3 T H' =log (2e ), H= W log (27eN). From (13) 
s y E inke of the typical density function for a long realiza- 
n as 2-7’, constant over a volume 2^7". For example, in the 
case of white noise, this volume is the sphere of radius JaN D: 
ao of white noise we have seen depends on its power 
. It is convenient to define the entropy power N, of any other 
process as the white noise power having the same entropy H',i.e. 


N, = 2?4'/(27e). (15) 

A useful inequality concerning the sum of two independent 
series, Z — X + Y, is that 

(16) 


W(X) - N(Y)«NG) «NO NQO?) 
py power and average power 
follows immediately because 
obtain the lower bound, let us 
y for Z, given the X and Y 


where N, and N denote entro 
respectively. The upper limit 
N(Z) «N(Z)- N(X)4 N(). To 
investigate the minimum entrop; 
entropies. We denote the density functions for X, Y and Zby 
: (|g and h respectively, the argument being a vector set of 
oordinates u. Introducing undetermined multipliers, and 


remembering also that 
[ro du = fow du=1, 
h(u)= Jr nom 


we minimize 
1--[ {logh + Aflogf +nglogg +f 10) ds 
1+logg) ôg 
-HESf - yog) du 


gàg - E of - yàg) du =0, 


ôI = - J (-+10gh) ah-+20 +logf) f+“ 


= -fa +logh) ôh+Alogf ôf + #108 


for all permissible ôf, ôg. Since 
a fy - anode [re-» ág(v) dv, 
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we obtain 

[rostnatu- vidus rigs 76 

(7) 

J'egituirtu- tae tegat) aro. 


Consider the case X and Y normal, i.e. 
"SEPi- iu'A-tuj ÉSIPI- ju Bu) 
T enh TAR? = Gera 
Then A(u) is normal with C= A+B, and 
Joc — v)logA(u)du 


= | {-4ntog2n— tog] c| —4(u’—v’) C-u — v) 


exp {—łu'B-tu) yy 
*~ nin [BIE 
= —dinlog?z —1log|C |—4v'C-1y — 1 trace (BC), 
From (17) we require the equality of this expression with 
A{— In log 22 — Flog |A|- jv'A-iv) 4 £", 


Whence we must have AA-1— C+, or A- AC. Then B= (1—A)C, 


and equations (17) can be satisfied by suitable choice of A, JE s 
Now for this normal case 


J(X) = $n log (27e)+ log | A |, (18) 


etc., or as 2 — oo, 
N,(X)=lim | A |V» 2 AN(Z), N(Y)-0-2) N(Z). 


Since in this case N,(Z) = N,(X) +N(V)isa minimum, we have 
in general N,(Z) > N,(X)+N,(¥). 

Capacity of a continuous noisy channel. By means of our 
enumerable system of coordinates (finite for finite T), we are 
able to define the capacity of our channel (in general affected by 
noise) as before, namely, max R, where R is the input entropy 


1:2 THEORY OF COMMUNICATION 217 


minus the equivocation. If the coordinates for the transmitted 
and received signals U, V are u, V, then (in formal notation) 


R=H(U)—H,(0) 
E feu Y) 4 ay 
-1im p|- [/eaogfen au f f(u, v)log fo dudy) 


tim p [frovis Ae (19) 


It should be noticed that this formula is invariant to a change 
of coordinates in u or v. Moreover, being defined here as the 
expected value of the logarithm ofa Likelihood ratio (cf. Chapter 8) 
R exists for rather general U and V. Tn the case when the noise 
V — U is independent of the signal U, so that pr(U)=p(V - U), 
we have 
R=H(U)—H,(U)=H(U)+H(V)-HU, V)=H(V)—HolV) 
-H(V)- H(V -U). (20) 
Maximizing R for variation of U means maximizing H (V), since 
the noise entropy is independent of U. 
Hence to maximize R for given average signal power P we 
must if possible choose V to correspond to ‘white noise with 
power P + NV, where N isthe average noise power. Since we cannot 
do better than this, we have 
R<max{H(V)}- W log (27eM) 
< W log {2ne(P + N)}— W108 (2e N,), 
where W log (27eN,) is the noise entropy H(V — U). If the dta 
is white noise, N, — N and V is white noise when U is made white 
noise.f In this case we have the equality 
C= Wlog (1 - PINY- (21) 
of some standard systems with 


A comparison of the capacities 
this ideal capacity has been made by Shannon (1949). , 
In other cases, suppose we still make U white noise. Then 


from the result (16) we have for the received signal an entropy 


power not less than P +M; that is, 
max{H(V)}> W log (27e(P + N,)} 


i i i ic coding procedure, as in 
T This may ideally be achieved by 8n asymptotic co c 
fhe discrete channel Y ace, (The messages are here still assumed discrete.) 
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and C2 Wlog(1-- PJNj). 
We have thus obtained in general 
7 
Wig =™ eoe Wig? tN, (22) 
1 1 


For N/P small, these limits become nearly the same. 

Since NV, <N except for white noise, (22) shows that white 
noise is the worst type of noise. For other Gaussian noise (with 
varying power for differing frequencies), it is evident that we 
can obtain white noise for V if the average power P + N is greater 
than the maximum noise power at any frequency, by choosing 
the appropriate power for the signal, made Gaussian, at each 
frequency. Thus the upper limit in (22) is then reached. It 
cannot be strictly reached in other cases, for no Gaussian 
variable can be resolved into two independent non-Gaussian 
variables, 

Continuous messages. In the preceding section we have seen 
that the capacity of a noisy continuous channel, with band- 
limited frequency and finite signal and noise power, is finite, in 
relation to the transmission of a discrete message sequence with 
finite entropy rate. This may be immediately seen in terms of the 
geometrical representation of the process. While we have repre- 
sented our signals in an entire space of 27'W dimensions, the 
restriction on signal power implies for long sequences restriction 
to signal points within a distance A(2T WP) of the origin. Noise 
shifts each signal point. For example, with white noise the 
received signals lie within a distance A[27TW(P 4- N)] of the 
origin, and J(2"WN) of the corresponding transmitted signals. 
The number of distinguishable signals thus remains finite, and 
for white noise is limited by the volume ratio (1 P[N)Y'7, as 
we saw in result (21). 

If the message source also refers to a continuously varying 
quantity, it is therefore not possible to transmit messages 
exactly, for their exact specification would require an infinite 
number of binary digits at each instant. However, given some 
permissible tolerance, we may replace the original message 
source X by an equivalent coarser signal. This tolerance v will 
in general be defined in terms of some measure of the difference 
between the transmitted and finally received process, e.g. their 
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mean-square discrepancy. It will be assumed that the stochastic 
process corresponding to the message source can be represented 
to as close an approximation as desired by coordinate repre- 
sentations of the type considered above. For any such approxi- 
mating representation, à length T will still have a finite set x of 
coordinates. If the final received message Y is represented by y, 
we shall define the entropy rate of the source, relative to a given 
fidelity or tolerance v, by Ry- min E, where R is defined, as in 


(19), by 
mn! f. y) 
o [vs fo 


f(x), f(y) denoting the (different) density functions of X and Y 
and f(x, y) the simultaneous density function. The minimum is 
to be obtained for constant v, by varying the conditional dis- 
tribution of Y for given X. 


With this definition, a source with rate Rọ can be transmitted 


over a channel of capacity C if Ro € C; but not if y» C. The last 
is an R, and 


part of this theorem follows immediately, since Ry 
pose fo(X, Y) 


no R exceeds C. To establish the first part, sup o 
corresponds to the system which minimizes 2 to Ry. Without 
ts in detail, we can see that, by 


repeating previous argumen 

suitable encoding, the received messages Y could have been 
transmitted at rate Rọ with arbitrarily small equivocation, and 
thus be available at the transmitter. The transition from an X 


toa Y is arranged according to fo(X. y), this ‘coarsening’ of the 
message closely corresponding to coarsening of a message through 
noise. We choose at random 9RoT : high-probability ' Y's out of 
the total such set of 277”, and associate, with each, 277 nen Rs. 
As T increases this covers almost all X’s, for each of which at 
least one appropriate Y in the related set will thus be available. 
From the ergodic assumptions this selection of the transmitted 
Y's does not affect the fidelity v, which is ensured for each ‘high 
probability’ pair X, Y without further averaging. 

To illustrate this result, suppose that v is based on the mean- 
Square discrepancy, and that the original message source is 


white noise. Then 
Ry=min (I(X)— H,(X)}= Bn 
hen the ‘noise’ Y 


X)- max {Hy(X)}- 


But max {Hp (X)} occurs W. — X is white noise. 
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and is W, log (27ev), where W, is the band-width of the message 
source, Thus 


Ro= W; log (27eQ) — W, log (2zev) = W, log (Q/v), (23) 


where Q is the average message power. More generally, for any 
message source of band-width W, Q average power and Q, entropy 


vii Wilog (Q,/v) < Ro < W; log (Q]v). (24) 


The lower bound follows because H(X)— W, log (27€Q,) and the 
value above for max (Hy. (X )} ean only be reached in the case of 
white noise. The value for the upper bound may be seen from the 
geometrical representation. In the message space the message 
point is confined to a Sphere of radius A(2TW, Q), and the root- 
mean-square discrepancy v allows a distance A(2TW,v) from 
message X to the received message Y. The consequent entropy 
rate, determined by the number of different messages which 
can be transmitted, is restricted by the ratio of these two 
volumes. 

Further notes. In practice, apart from the technical difficulty 
of achieving optimum results, the above criteria of efficiency 
may not of course always be the most relevant. This limitation 
refers not merely to the use of the root-mean-square discrepancy 
between original and received message as a measure of fidelity, 
but to the ignoring of finite delays (however long) between mes- 
Sage and reception in comparison with the indefinitely long 
processes considered. The rapid transmission of short signals 
would thus require Separate consideration. 

It is also evident that the theory would not necessarily be 
relevant to the important problem of detecting slight signals, 


detection of a signal in cases where the probability or likelihood 
function of the received data S is known to be f(S | H) in the 
presence of noise alone and f(S | H) if the signal is also present 
must be based on the likelihood ratio Sify (cf. Lawson and 
Uhlenbeck (1950), Chapter 7). Similarly the efficient estimation 
of, say, a target distance 0 by a radar signal must conform to the 
principles of estimation summarized in Chapter 8. 
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Chapter 8 


THE STATISTICAL ANALYSIS OF 
STOCHASTIC PROCESSES 


8:1 Principles of statistical inference 

The statistical analysis of stochastic processes arising in nature 
does not differ in principle from the analysis of other types of 
Statistical data, but the existence of some dependence or con- 
tinuity in the successive observations will often mean that the 
classical methods become inadequate, and need extension. 
Moreover, there are repercussions on the practical side, for 
unless the statistician has a well-defined and realistic model of 
the actual process he is studying, his analysis is likely to be 
abortive. It is of course true that a statistician must always be 
fully cognisant of how his data were collected and of any other 
relevant information, but in the case of stochastic processes this 
ancillary information should certainly include as thorough & 
theoretical knowledge of the mechanism and structure of the 
process as possible. This is largely because dependence has so 
many more possibilities @ priori than independence that these 
will usually need to be drastically restricted in any particular 
context. However, before we consider the problem of inference 
for stochastic processes further it will be as well to summarize 
the statistical principles to be used (for further details the reader 


is referred to Cramér (1946) or M. G. Kendall (1946)). ; 
dly be classified into 


The problems of statistics May broa b classi 
(i) problems of specification, (ii) problems of statistical inference. 
Theoretically we may discuss the first without the second, as we 
have been doing for stochastic processes UP till now in this book, 
but, as stressed above, the conv t true. It is also perhaps 


erse is no 
true to add that the practical use of theoretical specifications 
can hardly be separated from inference problems, for one of the 
latter’s functions is to check the adequacy of the specification. 
The more detailed the specification the narrower is the inference 
problem, but at the same time such a detailed specification may 
prove untenable as à representation of the data. 
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Where possible, the specification or probability model H should 
specify precisely the probability of the data S (and of any alter- 
native set of data S' which might have arisen under the same 
conditions). It is sometimes important to distinguish between 
H and the ‘structural model’ leading to H, though if two 
different structural models give rise to the same H , no statistical 
analysis can of course discriminate between them. The pro- 
bability P(S | H} will be denoted by p. If S refers to observations 
having a continuous range of possible values, p will be strictly 
zero, but we may consider alternatively the density function 
f(S | H). The function p or f is called the likelihood function, and 
we shall denote log p or logf by L. 

Statistical estimation. In many cases, while H is not known 
exactly, it may be known or provisionally assumed known apart 
from one or more unknown constants 9; (each with possible 
values assumed extending over a continuous range). In the case 
of one unknown only, we shall refer to it as 0. In the reference 
tosequential analysis in § 4-1 theinformation function introduced 
by R. A. Fisher has already been defined as 


oL 2 oL 
I(0)- z| (5) | (95) assumed zero) : (1) 
In the case of more than one unknown, we define the information 
matrix 2 
Ue gl oh OL) (2) 
v (êO; 00; 


Then under suitable conditions (the most important practical 
condition to check is noted in (1), that E (2L[00) is zero, this 
usually following if the range of the random variables does not 
depend on 0) we have for any estimating function T(S) 


ET —6y > b? + (1-426/20)2/1(6), (3) 


where b= E(T) — 0 is the bias in T. For unbiased estimates b= 0 
and (3) reduces to e?» 1/I(0), (4) 


where c? is the variance of 7’. The estimating function T may be 
any appropriate function of the observations; if it is a linear 
function, we shall call T a linear estimate. 

Restricting ourselves for simplicity to unbiased estimates (a 
wide class of estimates is at least asymptotically unbiased in 
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large samples), we have in the case of more than one unknown 


that the covariance matrix of the set of estimates T, of 0; is 


Fore below by the inverse (13) of the information matrix 

Uo A y which we mean in particular that o?, the variance of T, 

T ot less than the corresponding diagonal element of (I!) (and 

> general that similar inequalities hold for any linear trans- 
rmation of the estimates). 


The condition for the equality sign in (4) is that 


oL 
5 (0) (T — 6). (8) 


The corresponding set of conditions for several unknowns 0, is 


oL 
Z «X I0 9) for all i. (6) 
mates Â; are defined as any set 


The ‘maximum-likelihood’ esti 
In particular when 0L/@0; exist, 


of values of 0; that maximize L. 
they satisfy the equations 


EA ds 5 


Under some further conditions, which have until recently usually 
included the assumption of independent observations, the 
estimates 6, have the property that as ihe number n of observa- 
tions is increased they tend to be normally distributed about 0; 
with their covariance matrix the optimum compatible with the 
above results. Thus while optimum estimates in the above 
variance sense may not exactly exist (if they do, they are 
identical with the maximum likelihood estimates) equation (7) 
will provide estimates which have these optimum properties at 
least asymptotically. The relation 
aL ƏL eL 

z(a, 22) onl- an) (8) 
often convenient for evaluating J,;. 
tion (ef. equation (7), 87:2) 


aL 
UM ERE I,~1 9 
s] d " 


usually holds, and is then 
The asymptotic approxima 


is also useful. 
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Asymptotic confidence intervals. The classical large-sample 
estimation procedure is in effect to use the normal approximation, 
the standard error indicating on this basis the possible interval 
on either side of the estimate in which the unknown is likely 
to lie. This procedure may often be improved where worth while 
by making use directly of quantities such as L or 9L|00. For 
example, under the conditions assumed above, we have the 
quantity L’=éL/20 with mean zero and variance 7(0). For 
independent observations at least, L' tends to normality, and 
an approximate confidence interval} is obtained by solving 


the equation T(0)s L'IJI(0) - 4A, (10) 


where +À are the upper and lower limits corresponding to any 
stipulated probabi ity risk of a standardized normal variate 
falling outside these limits, Further improvements can be 
obtained by studying further the skewness or other higher 
moments of L’, Thus the general formula for the third cumulant 


of L’ is MAC 
e 
&- E) Jn 3210, (11) 
and in place of (10) we may use the equation 
T(0)—$k,(42—1)18— +A, (12) 


The correcting term is O(1//n), and the neglected terms are 
O(1/n). Similar modifications are possible in the case of more 


While these asymptotic methods have been replaced where 
feasible by more precise small-sample methods, they have a wide 
range of applicability, and as precise small-sample methods are 
less often available for the Stochastic process problems we shall 


f Anyone unfamiliar with the precise probability interpretation of a con- 
fidence interval should also consult the references already mentioned (Cramér, 
Kendall (1946)). 
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unbiased with minimum variance in the group of linear estimates. 
As we shall see in Chapter 9, this property may be extended in 
a certain asymptotic sense to estimation problems for auto- 
regressive series. 

Statistical tests. For testing one hypothesis H, against a rival 
hypothesis H the best criterion to use is the ‘likelihood ratio’ 
P[Po, or f|f, if only densities exist, where p, « P(S | Hj), etc., or 
equivalently L — Lo (this was assumed in the sequential sampling 


rule of $4-1). If p has the form 

pz P(S|H]- P(U | H)P(S| 0}, 
where the last factor is independent of H (or the aspects of it 
which are in doubt) and U is a reduced set of statistics obtained 


from the original observations S, then p/pp is & function only 
of U, which is said to be a set of sufficient statistics in regard 
known 0, U may 


to H. In particular, where there is only one un 
be a single quantity or ‘statistic’ T. 1 

The manner of using p/p Will in general depend on the situa- 
tion, especially if the rival hypothesis H is merely one of à class. 
But in the probability space of S a certain region will be favour- 
able to H, as against H, and this is defined by some condition 
P[p, € A. When a sufficient statistic T exists, the region will be 
defined in terms of critical values of T. Tt may be noted that if 
equation (5) holds then T is sufficient (but the converse need 


not hold). 

In certain cases when H and Hy 
but depend on further ‘nuisance ' parameters which are unknown, 
it is sometimes possible and desirable to remove these completely 
by considering the conditional probabilities p(S |H, U}, where 


U is a set of sufficient statistics for the nuisance parameters. 
they may be removable 


When these cannot be removed exactly, t : amo 
approximately by the substitution of their maximum-likelihood 


estimates. 

Suppose we now wish to test the *goodness of fit’ of a model 
Specified entirely by the set of parameters Ois à, where 6; 
(69 1,...,7) are supposed known and ¢; (j 41,9) unknown. 
The alternative class of hypotheses for comparison has both 6;, ó; 

tic test we substitute for ġ; in 


unknown. For a useful asymptotic “i i à 
the former case their maximum-likelihood estimates (0; given), 


are not completely specified, 
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and for both 6;, ¢; their simultaneous estimates in the latter case. 
Denote 0,, à; jointly by Ym. Then for Ên — Y, small, 

^ e " 
Ln) Ls Mn a (727, 5) nV 
—L(0;, $;) 
en 


710,640) -32,46/6)- 8) (- 2535 (846) — bx) 


whence, by subtraction, 
AL Gn) -LO 0,0] 31 — 38 = x. (13) 


where, if the standard asymptotic properties for the maximum- 
likelihood estimates hold, x2, x3 and x2 are on the basic or null 
hypothesis X? quantities (sums of squares of independent normal 
variables with zero means and unit variances) with degrees of 
freedom (number of independent variables) r 4- s, s and r respec- 
tively. When s=0, (13) becomes —-3[L—L(f,)] or x? with r 
degrees of freedom. The asymptotic y? form for the expression 
in (13) depends on the sample size n being large; it is equivalent 
to a x? form in distribution up to and including the O(1/An) 
terms, but neglecting O(1/n). 

All statistical inference problems do not fall under these 
headings of estimation, discrimination and goodness of fit tests, 


but we shall see that they cover most problems of analysis arising 
from stochastic processes. 


811 Application to stochastic processes. If we now 
attempt to survey the various types of data arising from stoch- 
astic processes some types may be classed as analysable by 
classical methods. The simplest type is (a) the purely random 
sequence of independent events, though here one should add the 
warning that if the order and independence of the observations 
is a feature to be tested, as in sequences of random numbers, the 
relevant tests will be those belonging in general to dependent 
sequences (see the analysis of probability chains in §8-2). Any 
non-random component can be tested and separated, so that the 
classical tests of means and regression coefficients were many 
years ago extended to test for the significance of strictly periodic 
components in time-series. But again it will be noticed that the 
validity of this procedure depends on the class of alternative 
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hypotheses envisaged, and if in the last case the alternative is 
pectrum, the 


a non-deterministic time-series with continuous s 
classical procedure will be inadequate. A second general type of 
problem analysable by classical methods occurs if (b) independent 
repetitions of the process are available, and some particular 
feature such as the number of bacteria or number of particles at 
a given time is observed. Examples are the numbers of mutated 
bacteria in replicate cultures ($4:31) and the numbers of electrons 
in a cascade shower (mentioned in $$3-4 and 3-42). Here the 
theory of stochastic processes is essential to provide a model 
for the observed phenomena, but the replication still allows 
classical methods of analysis and comparison of the data with 


the theoretical model. In many applications, however, the 
dependence of successive observations in a single observed 
are available, a mutual 


Sequence, or, if more than one sequence 

interdependence of the observations, makes new methods neces- 
sary. For continuous time-records, à re-examination of the 
inference problem is also clearly necessary: 

In the case of a random sequence for which a finite number of 
observations is available no new difficulty in principle arises, 
for the methods summarized in §8-1 are largely applicable to 
dependent as well as independent observations. The known 
theorems on the asymptotic theory of maximum-likelihood 
estimates do not in general apply to dependent observations, 
and have to be extended. These extensions, required also for 
the asymptotic properti sociated with the exten- 


es of L', are as: E 
sion of the Central Limit Theorem to dependent observations. 


Another more practical difficulty is that the distribution and 
dependence of the observations representing stochastic processes 
may well in many cases be imperfectly known, so that the precise 
formulation of the likelihood function my not be feasible 
without excessive and dubious idealization; in such cases methods 
of broader validity (like the *Jeast-squares' estimates to be 
considered in the discussion of autoregressive series) are 


advisable. 

When we consider continuous time-records, the further 
theoretical problem is to set up an equivalent of the probability 
function for such & record of given duration; this may be done 
if, as has been assumed in this book (cf. $ 1:3), we can describe 
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it fully by an enumerable sequence of coordinates. It will be 
sufficient for our purpose to note two of the most useful ways of 
doing this: 

(1) For processes m.s. continuous we may consider the values 
at n points £,, ...,t,, and then let n increase so that max (t,—t,-3) 
decreases to zero. 

(2) For processes for which dX(t) is zero except at an enumer- 
able set of random times Tis Toss.. Ty (N also random with 
P(N <œ}=1), then the probability may be specified in terms of 
these times and the corresponding values of dX(t), together with 
X(0) and N. (An example is the birth-and-death process of § 3-4.) 

More abstract representations have been considered by 
Grenander (1950), who has shown that it is legitimate to evaluate 
the likelihood ratio for the data either as in (2) (when available) 
or as the limit of (1) as n increases. This likelihood ratio, if 
evaluated for a fixed hypothesis H, in the denominator, can 
obviously also be used in estimation problems. The exact or 
“small-sample’ theory of estimation then still applies. The pro- 
blem of elucidating minimum conditions under which the 
asymptotic properties of maximum-likelihood estimates hold 
becomes at first sight even more formidable, though by repre- 
sentations such as (1) or (2) the problem should be reducible 
to a random-sequence problem, or the limit of one. We should 
moreover expect, by analogy with the relatively simple case of 
probability chains discussed in the next section, that in the case 
of completely stationary and ergodic processes for which the 
dependence drops off sufficiently rapidly the classical asymptotic 
properties will still apply. Relevant results for autoregressive 


and other sequences will be referred to in due course. 


8:2 The analysis of Probability chains 

It is instructive to examine in some detail the inference 
problem for a simple probability chain, which we shall suppose 
to be a stochastic process defined for discrete values of both 
variable and parameter, and whose probability dependence does 
not extend more than a finite number of intervals (k, say). For 
example, we may wish to test the adequacy of a Markov chain 
model to the observational sequences obtained by Svedberg 
and Westgren (see Chandrasekhar, 1943) from their counts of 
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colloidal particles in a small volume. We assume at present that 
the total number s of states is finite; in addition to stationarity 
(or ultimate stationarity) of the series, we shall also require to 
tee an ergodic property of the type called ‘positive regu- 
arity’ in Chapter 2. 

Denote the random sequence by 


=X, Xa ket 


efer of course to its serial order, and not 


where the suffices to X r 
f this sequence S is 


to its realized value. The probability o 


P{S} = PLX,) PLX, | X5) PUG | Xo X9 PX | Xa Xo 


n-k 
x TI PU] Xo eii (1) 
in 
The variable X can take s values denoted conventionally by the 
n Mngt? 


states 1,2,...,5, and hence a subsequence Xy Xia 


X, 4, can take s** ‘values’ (specified by the simultaneous values 
of the k+1 X's). Let the frequency of any such specified value 
(4j,...,q,r) for the subsequence of length k--1 be now 
For brevity we shall often denote the value (i,j, ...,q) of the 
subsequence X;,..-»Xask-a bY % and correspondingly the fre- 
quency Nj; ...,, by Nur: We denote also the conditional probability 
of the value r for the last variable of a subsequence of length 
k+1, given the value u for the subsequence of length k, by Pur 
Then (1) may be written 


k 

log P/S] = $ log P(X; | X Xo ^ Xa} + Zur Merl Pur (2) 
j=l 

As n increases, the second sum in (2) will become the dominant 

part of log P{S}. The maximum-likelihood estimates of the Pur 

are given by finding the maximum of L=log P{S} subject to 


the condition 
=1, (3) 


pI 
ZuDur 


nough the estimates 
(4) 
(5) 


whence we easily obtain for ? large et 


Bur = Nul No 


where N= EN 
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The goodness of fit criterion — 2[L — Lmax] or A, say, for n large 
thus becomes 
—22Z,,N, log (py Nal Nur) 
= 2[2, Nu, log (Nur Mur) x EN, log (Num,)], (6) 


where m,,—nP, —nP,p,,, m,—nP, (Pa P, denoting absolute 
probabilities of the ‘values’ (w,r) and u). In the case k= 0, we 


have the independent case; (2) becomes exactly 
L-X,N,log p, (7) 
and (6) becomes A= 22, N,log (N,[m,). (8) 


It is well known that the expression (8) is the appropriate 
criterion in this case, and has asymptotically the X? distribution 
with s—1 degrees of freedom as n increases, If we wish we may 
replace it by the x? expression 


g-z0-m (9) 


m, 


to which it is approximately equivalent as the m, increase. 
Similarly, we may replace the sums in (6) by sums like (9) if we 
wish, though there are some advantages in retaining the natural 
form (6). 

One incidental result from (2) is that if the N,, [n are consistent 
estimates of Pur=P,Py, (this follows from results established 
below), then 


lim 2 Eurr Pu Pur 108 Pur: ao) 
n>o 

This formula is imp 

(see § 7-2). 

From the definition of (6)it might be hoped that its asymptotic 
distribution will be the X* distribution. Now if we may assume 
that the asymptotic distribution of the N, is normal, it follows 
that the criterion A has such a limiting distribution. This would 
follow indirectly as a special case of equation (13), $8-1. More 
directly, for any limiting non-degenerate normal distribution 
in g variables denoted by the (column) vector X, we have (in 
vector and matrix notation) 


ortant in the theory of communication 


1 
P{X}oc nejv? t- 3(X—m)' V-(X—m)} (11) 
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and  —2flog P{X}—log Pasx(X)] - (X m)' V^ (X — m), (12) 


Lam is a x? with q degrees of freedom. If the distribution is 
i generate, with r constraints, we see, by an orthogonal trans- 
ormation of the variables, that A has a x? distribution with g—r 
degrees of freedom. 

The asymptotic normality of the 
below. First of all, however, we may note the number of degrees 
of freedom for A. The N,,, even if normal, are not unrestricted. 

By definition in (5) N, = N;;.. 4, 82.18 obtained from Nur = N;;...ar 
by summing over r. Alternatively, if we consider the frequencies 
Mu = N44, and obtain Ni, say, by summing over the first 
suffix k, we are obtaining à total frequency for subsequences of 
k consecutive terms, of exactly the same type u as those repre- 
sented in N,. We have therefore the linear relation for each v 


(s* of them ; 
) NS En Nija Ec = Nw (13) 


N, will be demonstrated 


in the total sequence, which 


s. The relations (13) are not 
gives the same 


we have also 
(14) 


except perhaps for an end-effect, 
becomes negligible as n increase 
algebraically independent, for summing over allu 
total frequency n’, say, on each side. However, 
the condition 
Dur Nur 7» 

d. Hence there are (sk—1)+1=8* 
rees of freedom for N, and 
are non-zero) 


where n/=n—k~n is fixe 
restrictions, and the number of deg 
hence for A will be (provided all mu. 

gH — gt e sh (s — 1). 
t from (6) the similar quan- 
etwo additive 


(15) 


e subtrac 
dent sequences we hav 
om s*-1(s— 1) and 


! It may be noted that ifw 
tity defined for (k — 1)-depen 
quantities with degrees of freed 

s*(s— 1)- sk-Y(s— 1)- sk-(s— i 
nents will only be x?’s separately if the 


(k— 1)-dependent. If it is k-dependent, 
t may be above or below 


However, the two compo 
Sequence is not more than 


the expectation of each componen 
its nominal number of degrees of freedom, depending on the 


transition-probability matrix. For example, anticipating the 
detailed results given below for the case k=1, s=2, we find for 
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these asymptotic expected values (calculated for the standard 
quadratic form for x?) the analyses: 


(i) P11 = Po (independence): 
(3:0—2x 1-0)+1-0=1-041-0=2:0 
(i) Pu = 2n 7 $: 
(2-5-2 x 0-5) +0-5=1540:5=2:0 
(iii) p3 7$, P=}: 
(3-4—2 x 1-4)4+1-4=0-6+41-4=2-0, 


, 


2 


For a Markov chain (/=1) the simultaneous distribution of the 
Nr, which we can in this case write N;;, may be investigated by 
the method described in $2-22. We replace the transition pro- 
bability matrix Q by R(0) 2 {pei} (16) 


and easily find for the cumulant function 
K,(0) «log E(exp £; Oiz N} 


the asymptotic form 
K,,(®) ~nlog x, (0). (17) 


Here 4, (0) =A, = 1 is the dominant root of Q (the chain is assumed 
regular), and we assume further that /41(8) + 1 if some 0,, 4- 0. This 
last condition, which is required to ensure the validity of (17), 
is automatically satisfied if ,(8)+1 for any 03-0, which is 
required if all the frequencies N; are to be represented in (17); 
and follows if all the Piz are non-zero, as already implicitly 
assumed in the discussion on degrees of freedom. Under this last 
condition we shall see further that the variances of the N;; are 
asymptotically of the form no?,, where 61; is non-zero and finite, 
and it follows from the Central Limit Theorem that the simul- 
taneous distribution of the N, (appropriately scaled) is asymp- 
totically normal. 

The distribution will be degenerate, owing to the linear 
restrictions on the N,;. However, all that we need to know to 
calculate the criterion A are the expected values Miz ~ NP; Pigs 


+ It might be noticed that the essential feature is the unlimited increase with 
n of the number of independent ‘trials’ starting from each specified state; in 
this form the argument may be extended to chains with an enumerable but 
‘effectively finite’ set of states, such as the emigration-immigration process. 
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where P, refers to the final distribution of X, and is given by the 
column latent vector s, corresponding to the latent root 1. The 
expected values m;; are asymptotically correct even if the chain 
is not initially stationary, and only becomes so during the 
observed sequence, but it should be an improvement in such 
cases to substitute more exact expected values based on the 


observed initial value. Moreover, for an initially stationary 
process, the exact expected values are available, namely, 
ber of transitions), OT, 


(n — 1) P;p,; (based on the available num 
more generally for k-dependent chains, (n— k) PuPur 
If in (16) we put Pi; =P; We obtain the case of an independent 


sequence (k= 0), such as a sequence of random numbers, and 
we see that the above theory is required for an adequate discus- 
sion even of this case, though this has often been overlooked. 
In the case of k> 1, the only further point to make is that any 
k-dependent chain with s possible states can be regarded as à 
Markov chain in a variable with s* possible states, specified by 
the set of k consecutive values of the original variable. This may 
be illustrated by considering the case $7 2, k=2. We have the 
following transition probability matrix for the composite 
variable with four possible states: 


1,1 1,2 2,1 2,2 
1,1 Pm 0 Por 0 
1,2 Puz 0 Paz 0 
2,1 0 fin 0 Pont 
2,2 0 P122 Porz 
al probability of Xp2=J given 


Here Priz denotes the condition 
X,=h, Xa 35 and the compo 
specified by the two states (i, J)- 
posite variable at time r + 1, then 
take values for which i remains © 
above table. 

This transformation to 8n equivalent Markov chain is of 
Course given to enable the asymptotic normality of the Naiz (OF 
N, in the abbreviated notation, with u= (h,i)) to be demon- 
Strated by the Markov chain technique; and does not affect the 


ble at time 7+ 2 is 
occurs in the com- 
ariable can only 
the zeros in the 


site varia 
Since i also 
ew composite v 
onstant; hence 
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previous argument about degrees of freedom. It may be verified 
that the occurrence of zeros in the above transition matrix does 
not affect the appearance of all the p,,;; in the equation for i,(8). 

The non-vanishing of these permissible probabilities Dui 385 
as in the case s=2, a sufficient condition for the required regu- 
larity property to hold. This may be seen in general for the 
transition matrix R obtained from a k-dependent chain by con- 
sidering the matrix power R*. This powering has the effect of 
freeing the possible transitions between the states from the 
artificial restrictions imposed by their definition, and all the 
terms of R* become positive if the original probability coeffi- 
cients p,, are all positive. It now follows from one of the 
conditions quoted in § 2-21 (see end of second paragraph, p. 34; 
this particular condition is due to Markov) that the process is 
positively regular. 

The condition p,,>0 is necessary for the degrees of freedom 
S*(s— 1) to be valid, but is not essential otherwise. If the degrees 
of freedom are adjusted to allow for zero Pur Weaker conditions 
ensuring positive regularity are sufficient; for example, for 
processes with no cyclic groups (no root j/,(0)4- 1 of modulus 
unity) but possessing ‘paths’ from any stafa to any other, 
including itself, in a finite number of steps (see § 2-3; the absence 
of cyclic groups is automatic if the chain is an intermittent 
sequence obtained from a Markov chain defined for continuous 
time). If this test is used as an approximation for sequences 
with an enumerable but ‘effectively finite’ number of states, 
such as the colloidal particle counts, an appropriate adjustment 
of the degrees of freedom is of course also required. 

Explicit evaluation of 4,(8). An exact expression for /4;(8) will 
m general be impossible, but its expansion in ascending powers 
of £;; can be investigated if required from the determinantal 
equation for ji, (9), starting from the solution 1 for 8 —0 (we may 
thus verify that the coefficient of 302, in the expansion O 

log;4(8) is non-zero and finite, this giving the variance of the 
corresponding ‘scaled’ frequency Nn). In the case k= 1, 8= 2, 
however. we easily obtain (cf. the expression for the root in $ 2-22) 


4(8) = ipu en + Dag ^22) 


+3 fts eu — 33 69:3)? + 4py pis eif]. 
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Up to the second degree in Bij, the expansion of log Jn (8) is 


P2112 


^. PuPa je? ( DioDoo ) 05,40 ( 
Piz + Pa Pit Par F lOa tha) Pret Par 


4.408, 1p, 4 ee PUP 4p3(Pu— P22)" | 

\ Prot Por (is Pa)” (Zi; P2 
PiP? dp 2 d 

(Py. + Pas)” (Piz + P21)? 


pio — bP P22 _ 
Prot Pa 

2 92, 942. 2. 
M On| "n Te E. DT E ores 

"ETE MC 
"ER NC 


Dor Pis Don papse Pu Pe?) PES 
00s On|- Um zx uH (Dia + Pa) 
The mean values given by the coefficients of 0;; agree with the 
values P,p;;, and it may be verified that the variance-covariance 
matrix V given by the quadratic expression je've (in which 0' 
stands for the row vector (811, 912 92 0,,)) has rank 2, in agree" 
ment with the degrees of freedom s(s—1) when k= L $7 2. The 
Variance matrices for the three particular cases referred to earlier 
are recorded for reference (as Ma ^ ^z? the middle two rows ana 
columns are identical, and for convenience are not separated). 


Case (i). py =P =P = 1-4 (independence): 


Ny Na Nn Ns 
patap) Irem 070 apg 
ava -opg+pe 0-30 7 2p? + Pa? J+ 
— 3pg? — 2pg? pg gpl + 34) 
Case (ii). p=} Pa=5: Case (iii). P117 brach 

13 -4 -ő G = c4 
7gay.| 54 4 —-4 125n7V ~ —14 8-2 
aaro 95 
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The effect of estimating unknown parameters. As shown by 
Fisher in the classical x? frequency theory, the estimation of m 
unknown parameters reduces the degrees of freedom by m, but 
asymptotically has no other effect on the x? distribution pro- 
vided that the estimation is fully efficient. The above results 
show how these features apply to the extended problem. Suppose 
(in our general notation) that the probabilities p,, are unknown 
until m parameters a, have been estimated. However, the 
maximum-likelihood estimates of such Pur individually are 
based on the N,, which tend to normality for large n; and, 
moreover, their joint distribution may be regarded as arising 
from the convolution of independent components. In these 
circumstances these estimates },, are efficient. The estimates 


of the c, are similarly from (2) given asymptotically by the 
equations 
dlog p. 

RE "n (18) 
which may be regarded both as m linear restrictions on the N,,,, 
and as equations for c, in terms of Ô, or of N. It follows from 
the first of these remarks that m degrees of freedom are lost, as 
in the classical case; it also follows from the second that equation 
(18) gives efficient estimates for «,, and that the asymptotic x? 


distribution, apart from this loss of degrees of freedom, is 
preserved. 


ur 


The efficiency properties of the maximum-likelihood estimates 
9, provide an interesting alternative method of investigating 
the asymptotic fluctuation formulae for the N, For it follows 
from the form of the likelihood function P{S} in (2), i.e. 


LzlgP(Sj.Z,,N,logp, (E,po = 1), 
that the information matrix for the Pur namely, 


[- eL 

pros, 
(which provides the inverse of the asymptotie variance matrix 
for the estimates 2,,), is asymptotically equivalent to that for 
multinomial probabilities p,, (u fixed) from E{N,} = 
dependent observations, with, moreover, 


E(—-OL|op,0p,)-0 (u+ t). 


ma in- 
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This gives immediately the variances and covariances of the 
Puy=N,,,/N, as standard multinomial formulae, with the extra 
results cov (Pu Pig) 2 0 (wt). We may now use the further 
standard asymptotic formulae 


N?P, P,cov (Pur Pig) ~ COV (Nurs Nia) — Pia 69 Y (Nurs N) 
— Pur COV (QN, Na) +Pur Pia cov (Wos N) (19) 


to obtain relations between cov (Pur Pu) and cov (Nur Nig), 
leading to a set of linear equations to determine the latter in 
terms of the former. While the number of equations in (19) is 
nominally equal to the number of unknowns, some of these are 
not algebraically independent. However, it will be found that 
the degeneracies among the cov (Nurs Na) counterbalance this 
so that a unique solution is in fact possible by this method. (It 
was used, for example, to check the covariance matrix for the 
N;; in the case of the Markov chain with s=2, 2475 2n73J 
This follows in general from the relation between the amount of 
degeneracy and the degrees of freedom which for the maximum- 
likelihood estimates are obviously s—1 for each u, and hence 
S*(s — 1) for all » (in agreement with the total number of degrees 
of freedom deduced earlier. The number of independent 
equations represented by (19) is thus not skt+1(gkt1 41), but 
t(s — 1) [s*(s—1) +1]. 

Example. We are indebted to B. J. Prendiville for the following 
numerical illustration of the above method. An artificial Markov chain 
was constructed with the aid of Tippett's random numbers to correspond 
to the transition probability matrix 

0-625 025 0:25 
0:25 05 0375]. 
0.125 0-25 0:375, 


The first 150 values of the ‘state’ variable were: 


0122100000012221001000000 
112002110000000000000011] 
1000021002100000011122002 
liiisiir3311220220122202901 2 
000000222i111011110021100 
0002211111212000122200011 
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The observed and expected frequencies for N,,, given k=1, are shown 
in the table below: 


Nou | 
0 1 2 "Total 
m 
0 


51 (37-25) | 12 (13-83) 6 (8-52) 69 (59-60) 

1 11 (14-90) | 31 (2767) | 11 (12-77) 53 (55-34) 

2 8 (745) 9 (13-84) 10 (12-77) 27 (34-05) 
4. 


Total | 70 (59-60) | 52 (55-34) 


149 (149-00) 


The log formula gives 
xX*=10-02—3-50=6-52 (6 degrees of freedom), 
and the quadratic y? formula gives the approximately equal value 
X? = 10-06 — 3-48 = 6-58, 


indicating that the frequencies of the realized series accord very well with 
expectation. 


B. J. Prendiville examined the adequacy of the emigration- 
and-immigration Markov chain model (equation (5), §3-41) to 
counts of colloidal particles by such methods. This model, while 
a useful first approximation, cannot be strictly correct for counts 
of particles which move continuously in space, owing to the non- 
Markovian character of grouped counts (see end of § 5-21), but 
rather extensive data are needed to detect any but large dis- 
crepancies. (One or two significant anomalies were detected 
in the colloidal particle counts, but no marked systematic 
discrepancy.) 

Another example of the use of this particular Markov model, 


» the study of the movements of spermatozoa, is referred to 
ater. 


821 Goodness of fit of marginal frequency dis- 


tributions. In the preceding theory the relevant frequencies 


were the transition frequencies N,,, but it often happens that 


a marginal frequency distribution of, for example, N,, is obtained 


d A uw 
and compared with a theoretical law such as a Poisson or normal 


distribution. The standard y? theory, as was clear from particular 


cases in §8-2, no longer necessarily applies, and needs re- 
examination. 


8-21 FIT OF MARGINAL DISTRIBUTIONS 239 


From equation (12) of the last section, a modified y? is in 
principle available from the joint normal distribution of the N,, 
but requires the inversion of the covariance matrix V. In view 
of this complication and the familiar use of the quadratic 
expression n " 
X= Eu (Nu 7 Mu)? [My a) 
for measuring goodness of fit, it seems useful to retain the 
criterion (1) and to determine the effect on its asymptotic dis- 
tribution due to the stochastic process dependences. An ex- 
amination of this effect need not be confined to probability 
chains, though the possible values of a continuous variable X 
of course require final grouping before any x? theory can be used. 

Some theoretical results due to V. N. Patankar (1953) are 
quoted for reference, though it should be added that they are 
as yet incomplete, and do not include the modifications due to 
estimating unknown parameters. For any fully specified process 
grouped into % classes with frequencies N,, the mean and variance 
of x2 may be evaluated. Thus " 

k oF 
E= X P, Q 
ucl Mu 
and if the N, are asymptotically joint normal variables we easily 
find also u? 
inad Yr e 3 
d {xd} wot M My i 
where w,,,/,/(m,,m,) is the asymptotic covariance of N,/ Am, and 
N,|,/m,. For a normal stationary process, with marginal normal . 


distribution, these formulae become approximately (for equal 
grouping intervals) 


Ee k-142X Ps, (4) 
9 s211 — Ps 

2 D e PsP 
22) 90b — 1) N: Y: T--3—-4-8 <1, 5) 
cd) 20—1)83 17 8 2: 1. p, 


Which become for a normal Markov process 
eo pi 
E{yg}~k—-14+23 ——- (6) 
$-11—fi 


S Pi 
2(y2) ~ 2(k—1)--8 X ——— s (7) 
cQ ~ 2 ) s1 (l — Pi)" 
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An investigation of the emigration-and-immigration Markov 
model showed that formulae (6) and (7) apply to this case also, 
at least to O(p}), and suggests that they may be of more general 
applicability. 

In this last case the marginal distribution is Poisson. An 
observed marginal distribution of colloidal particle counts by 
Westgren (quoted by Chandrasekhar, 1943), with theoretical 
mean 1-428 and with p, — 0-606, is shown in the table below: 


0 1 2 | 3 4 5 | 9T | Total 
more 


Observed | 381 568 357 


175 67 28 7 1583 
Expected | 379-6 | 542-0 | 387-0 


184:2 | 658 | 18-8 | 5:6 | 1583-0 


We find from (6) and (7) 
FXG} ~ 11-63, 0?(y§) ~ 56-80. 


The observed value of X? is 8-95, and as this is less than expecta- 
tion the fit is satisfactory. However, to illustrate an approximate 
quantitative method of making use of standard X? theory we 
note that Ay¢/B has mean f =A?/B and variance 2f, where 
AzE(x$, 2B—c*(y$), and hence should be an approximate 
x? with f degrees of freedom. Here we obtain 4y2/B — 3-606 with 
[-46. 

The formulae quoted above have also been extended to apply 
to marginal distributions obtained from two-dimensional 
stochastic processes. 


8:3 Estimation problems 


In this section we shall indicate the use of the likelihood 
function in some particular stochastic process estimation pro- 
blems; the autoregressive estimates used in the correlation 
analysis of the next chapter are, however, not dependent on a 
complete specification of the likelihood function, and for the 
moment are only referred to incidentally. 

Normal Markov sequence. For a stationary linear Sequence 
Xo, X, ..., X4, where 

X, =PX, +Y, (1) 
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the Y, being independent normal variables with zero mean and 
variance c, the log likelihood function is, apart from a constant, 


e 2 * d HM) n 9 
L--—1ilogok 5 log oF 30% 32 (2) 

n 2. 12% 
~~ glesor— 3 2 oe 


(3) 


the neglect of the end-correction in (3) having a relative error 
only of O(1/n). We have 


OL _ $ (X,-BXp1) Xa (4) 
op f= cy 
whence => Zal XXia (5) 
r=1 r=1 
eL n 
We have further I(f)— z| — a| =i (6) 


It is known, as will be referred to again in the next chapter, that 
the classical asymptotic properties for the expressions in A9 
or (5) still hold, so that the asymptotic standard error of f is 
A. — 72)/n). We have also if required 

BL , OI(f) 2ng E enf . " 

m= peace Plae O 
thus a more accurate confidence interval for 2, with allowance 
for the skewness of @L/0f, should be determined from the relevant 
roots of the equation 


x Xv a " n 
E Gp) FRO ya aJa) A 


(where A= + 1-96 for upper and lower 0:025 probability limits). 
It is interesting to notice the appearance of the skewness term 
ìn (8)—in contrast with the classical regression case. 

It has been assumed here that o} is known. If it is unknown, 
the estimation equations must be extended to include o 2c, 
Say. It will be found that 

oL n a Go Bau) 
Qa 2o? (3 n ? 
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whence Loy 1,0, sjzml()) 
and &(f)— $ QG-BX.AY (9) 
r=1 n 


With I,,=0, the confidence interval obtained from (8) may be 
shown not to be affected to its relative accuracy O(1/An) by the 
substitution of &(A) in (8) for o} (if L,,4-0, the variance of 
[0L/22).p)~ I55— 125] 1,2). If the mean m= E(X,) is unknown 
and thus not necessarily zero, this must of course also be 
estimated, as in the example below. 


Continuous time case. Consider the analogous problem in 
continuous time 


dX (t) +X (t)dt=dZ(t), (10) 


where Z(t) is a normal additive process. This example has been 
discussed by Grenander (1950), who, however, considered only 
the estimation of the mean m of X(t). We shall accordingly 
Suppose more generally that the intermittent observations 


satisfy an equation similar to (1), but with a non-zero mean m 
included, i.e. 


X,—m-fAX, ,—m) Y, (11) 


where, for observations at intervals At, B=exp(—pAt). It ap- 
pears relevant in most practical problems where autoregressive 
schemes of the type (10) are used to assume that the underlying 
disturbances dZ(!) or Y, are of constant variance increment c? 
per unit time. Then 0% =0?/(2u). For 
Bm Ky Rigg ness X 


n? 


the log likelihood function now becomes, apart from a constant, 


n " 1 
3 98 Y — aor Ua mp 


L(S, | m,u, o*)= — flog ok- 


id n 
"got, E, P7 BX a ma - ye, 


where c% is, as in (1), the variance of Y,. If we now let At—> 0, we 
must first stabilize the log likelihood function by the subtraction 
of L(S, | Ho), where H, is an appropriate invariant hypothesis. 
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We defer the question of the estimation of c?, and subtract 
L(S, | 0, 1, 0?). This gives 


jim Ulm, 4,0?) — L(0,1,0?)] 


Z AX (0) -m , X70) | E [uf x-maxe 


c? o ` 20 


--— T E if, a T 
I ' (Xt) - mat Uu g xwax(y+ ff X*(t) a t ilogz. 


Thus (for the limit as At— 0) 
T 
xoyxexe su X(t) dt 
o a 12) 
ear m|, — ( 


ay uean] 
am 


[ex@-miaxo «ixi - nra 
aL 1 [X()-mP Jo Mn 
OL Qu g? z g? = gi . 


(13) 


From the form of equation (12) we see that if is known, then 
the optimum unbiased estimate of m is 


m 
xq) xt) +a, X(t)dt 
2+4T 


A 


m= (14) 


> 


and that its variance is o?/[a(2 +4T)]. For large T the estimate 
T n 

becomes asymptotically the mean X(t)dt/T with asymptotic 
0 


variance g?/(u?T). If j is not known, we should be obliged to 
substitute an estimate obtained from (13), and as the pair of 
equations is not of the form required for minimum variance, no 
advantage over the asymptotic estimate is necessarily gained. 
It will be noticed further that the precise estimate of u from (13) 
requires a knowledge of o?; however, for large T the first two 
terms are neglected (cf. the first example) and the asymptotic 


estimate a 
- [xo axe [X() — m]dt (15) 
0 o 


is Obtained (in which, if m is unknown, we substitute its asymp- 
totic estimate above). This we shall see is equivalent to the 
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asymptotic least-squares estimate, and its asymptotic variance 
is 1/Z p~ 2n/T. 

In the above equations we have not considered the estimation 
of o*, for this leads to difficulties which are strictly due to the 
above formulation becoming unrealistic in the limit as At-> 0. 
We should find a term (AX): 

lim X 


At 0 T 


arising, which, as (AX)? is of order AT, exists but could hardly 
be evaluated in practice. A similar situation arises in the model 
for ordinary Brownian motion, where a variance estimate could 
theoretically be based on such a limit and be determined exactly 
for any finite period (being based on an infinite number of degrees 
of freedom). This impasse is associated with the assumption of 
a normal disturbance term, and would not arise if the disturb- 
ances were assumed to be of more general additive type. How- 
ever, if they were purely of transition type, they would give rise 
to a finite number of discontinuities in any finite interval 7’, 
and the parameter # could be measured exactly from the 
observed decay in X(t) between such jumps! It is not impossible 
for processes of this kind to arise in some physical situations, but 
in other contexts the model would be an over-idealized one, and 
estimates not dependent on such a precise use of the likelihood 
function would be preferable. In the above Markov linear process 


m 
the simple estimate Zi [X(t) ^mfdt|T for o? will usually 
be adequate. 

As (14) gives (for jJ; known) the unbiased estimate of m with 
minimum variance when Z(t) in (10) is normal, it will also provide 
a linear estimate with the same mean and variance properties 
for any stationary process with the same autocorrelation function 
exp(-— xT). For other such processes it is not, however, necessarily 
the optimum estimate, and it may be possible to find a better 
estimate by making use of the correct likelihood function, An 
example given by Grenander is the following. 

A simple Poisson additive process with events occurring at 
average rate #2 has associated independent normal variables Z, 
at each time of occurrence T,. The process X(t) is defined as Zo 
(0€t« Ty), 2,(7,<t<T,), .... As the contribution to the auto- 
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correlation for an interval At is 1 with probability exp (— jAt), 
and 0 with probability 1 —exp(— At), this process has the 
required correlation function. The distribution is moreover 
normal at each point ¢, but is not of course a normal process 
identical with (10), as we shall see by setting up its likelihood 
function and obtaining a fully efficient estimate of m. Let the 
number of occurrences in (0, 7) be N, a Poisson variable with 
mean 4T. When N is n, say, the distribution of the times of 
occurrence 7}, T5, ..., T, is uniform in (0, T) and does not provide 
any information on m or x. Using the method of representation 
(2) in §8-11, we have 


1 n 
c prc LK = 2 
(T rert exp| 3922057 m) | 


n! (277) gin 


f(S| m,n) = 


where c? is the variance of the Z,. Hence 


L(m, u) 
N 
-Nlogu— uT —(N -1)logo;— zx (Z, — m)? + constant, 

20 

9L 1([N 

am oS som]. (26) 

oL N 

mu T (17) 


Equation (16) is not, with N variable, of the required form to 
Provide an unbiased estimate with minimum variance. How- 
ever, the maximum-likelihood estimate of m is 


(18) 


With asymptotic variance 


2 2 
i o2 s oÈ 


IN +1} +p? pT 


I (19) 


mm 
T 
As for this process c% — o2, the linear estimate Í X (0) dt/T, with 
0 
88ymptotio variance 26*-|(u T), has a limiting efficiency, measured 
Y the ratio of these variances, of 1. (Formula (18)is not a ‘linear 
estimate’ in y (t), as it is an integral over X(t) with weights 
“pending on the realization.) 
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For y, (17) gives £— N|T, with variance |T. The 'least- 


squares’ estimate (15) is rather irrelevant here, as equation (10) 
no longer holds. The estimate of o% is obviously 


N 
X (2,— fn?](N +1). 
0 


The ‘emigration-immigration’ process. This process (defined 
by equation (5), § 3-41) deserves some consideration in view of 
its practical applications. We saw in §6-31 that the regression 
of X(t) on X(0) was linear, and for large mean m=v/j, the 
Poisson marginal distribution becomes approximately normal 
and equation (1) above would approximately hold for observa- 
tional counts made at unit intervals, with 2 —e-". However, the 
estimate B in (5) would still not be the maximum-likelihood 
estimate of fj, for we have additional information that 7% — m. 

The exact likelihood function for the sequence X,, X,,..., X 
is set up as 


n 
P(Xo) II P(X, | X,..,), 
and with the usual neglect of the first term, we obtain 


Lm, u)- Y log (X, | X, ,), 


where it will be found that p(X 
rather awkward sum 


Qy= g-na-p se ms — fyisi-is psj} 


,=1|X,_,=j) is given by the 


i—s)!(j—s)ls! (20) 
8- =s)! (j— s)! s! 
We thus have o (i—s)!(j-s)!s 
9Q; 
| EE m uS. 504-3, 
Ray - 
L edi aes reas 
where Ryo E d mi=(1 — gyiei-t gs 


a H H 

s=0 (6—s)!(j—s)! (sC 1) 

The maximum-likelihood equations for m and f are thus (cf. 
Patankar, 1953) 


| Ry 
uA (21) 


U"71-2* 70-5 Oy) 7 (22) 
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Where £ denotes summation over all pairs i, j in the sample 
(each pair occurring its appropriate number of times). From (21), 
site), (23) 
n(1—f) 

ovi f 
and from (22) fim — + it p E(R4/0;) = 0, 

1-58 PA-P) 
as Xi—Xj (apart from a possible end-effect which we are neg- 
lecting). Substituting for ® in this last equation, we obtain 


^ 
m= 


B-xduQIi, (24) 
and inserting this value in (23) obtain also 
fi, — Xi[n. (25) 


The last estimate for m is simply the sample mean, as might be 
expected. The estimate of /! in (24) is awkward to use, in view of 
the complicated expressions above for E; and Q;; and it is 
usually convenient to adopt simpler estimates. 

By proceeding as in the first example, but with the restriction 
OX — m, it is possible to obtain estimates efficient at least for 
large m. In addition to the estimate (25), the equation is obtained 
(we omit the details of the deduction) 


jC-xe- ya - fe - 

m 
where C is the observed covariance between consecutive obser- 
vations and V the observed variance. As in large samples 
m->m = 0%, 0 fV +0 (both in probability), A is evidently a 
Consistent estimate for any value of m; it could be found by 
iteration. From the asymptotic theory, the validity of which 


B. not be affected by the restriction a — m, it is found further 
at 
ml-cf 
mous d 27 
a?(im "f (27) 
a result which is easily shown directly to be still true for finite m 
(ef, $9-1), and 4 (1-22) 


7B) ~ as Bn’ (28) 


this last result only established for large m. Rothschild (1953), 
In an interesting use of this model to estimate sperm speeds 
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from counts of consecutive numbers in cinemicrographs, has 
employed the estimate 


É-1- 3830, (29) 
where jE, 
1 n 


This estimate is also obviously consistent for any m, and Roths- 
child has given its asymptotic variance (obtainable from the 
joint moments of X, up to the fourth order) as 


(1—#) (37-5) , B0 —£) 
e] U+ + am 


For variable time interval At between observations, we had 
f=e-4!, It is the quantity ~ rather than £ which is required 
(being directly proportional to the average sperm speed), and 


as for large n 
OTA 
we (Blog BY 

for any consistent estimate £,, we require first to divide (28) or 
(30) by (flog £)? when comparing the efficiency of estimating / 
for different At. It will be seen from the graph (fig. 9) that for 
large m, o(ji) compares reasonably with o?(£) for 2 0:5. From 
an examination of o°(/i) for varying 2 when m is small, it appears 
dangerous to go above about /—0:9, so that a useful range 
appears to be (0-5, 0-9). 

The Markov chain model (20) in this particular application is, 
as in the colloidal particle applications (see § 8-2 at end), some- 
what over-simplified,} so that it is advisable to make a test of 
the adequacy of the fit. The data available were of series of about 
100 consecutive counts, and were not extensive enough to permit 
a full analysis of the type discussed in §8-2. The correlation 
function could still be tested, however, by the technique to be 
developed in Chapter 9, where a further reference to this par- 
ticular application will be found (§ 9-13). 

The ‘birth-and-death’ process. The three previous examples 
exhibit the common feature that they refer to stationary ergodic 
processes for which the asymptotic properties of the log likelihood 


(30) 


+ Cf. the discussion by Rothschild and Ruben (1956), who consider also 
the use of a multi-region model. 
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funetion (or ratio) and maximum-likelihood estimates still 
apply (cf. §9-1 for the first two examples, and §8-2 for the 
third). The last example, which could equivalently have been 
called a ‘death-and-immigration’ process, was a Markov chain 
of positively regular type owing to the existence of immigration; 
& stationary ‘birth-death-and-immigration’ process would be 
More complicated, but not present any new features. However, 


0 o1 02 03 04 05, 06 07 08 09 10 


Fig. 9. The varianco oj) of the maximum-likelilood estimate Ê for the 

®migration-immigration process compared for large m with o*(), where fi is 

a estimate based on the mean-square difference. The variance of ft for m=1 
nd m=10 is also shown. 


for Processes which are not of a positively regular type owing to 

© presence of absorbing states such asymptotic properties 
may well break down; for example, a simple birth-and-death 
Process starting from one individual will terminate at the first 
Occurrence if this is a death. The distribution problem will, 
Moreover, not only be aggravated by such irrevocable ‘stops’, 

Ut also by the accelerating action of births in a multiplicative 
Process, 80 that in a given interval of time 7 the number of 


births is liable to be a very unstable quantity. 
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The simple birth-and-death process will be briefly examined 
in view of these difficulties, as while it seems doubtful whether it 
will often be used without modification as a realistic model of 
actual events, it helps to indicate a possible method of attacking 
the estimation problem in similar cases. We consider the case 
where the full time record is available, and the representation (2) 
of §8-11 employed. To try to avoid the complications noted 
above, we adopt a sampling rule determined by a fixed number of 
occurrences n, and not by a predetermined time interval T. 
(The theory of such ‘non-classical’ sampling procedures is now 
better known, the sequential sampling of §4-1, and ‘inverse 
sampling’ in which trials are continued until an event of pro- 
bability p has occurred a predetermined number of times, being 
familiar examples.) 

The interval AT, between the rth and (r+1)th events will 
depend on the population size N(t) at time T,, the interval AT, 
having the exponential distribution with mean 1 IL + p) NL]. 
The (r+ 1)th event will, moreover, have probability /(A +) of 
being a death. As any actual realization can be generated from 
these facts, they must determine the likelihood function. 
Regarding A+ and y/(A+) as new unknowns 0 and $, say, 
we have: 

(a) 20N(T,) AT. is distributed as a x? random variable in- 
dependently of N(T,) with two degrees of freedom (for 4x? with 


two degrees of freedom has the exponential distribution with 
mean unity) Hence 


n-1 Ta 
20 Y MAT = 20] noa 
m 0 


is a Y? quantity with 2n degrees of freedom 


, enabling 0 to be 
estimated, e.g. an exact confidence interval can be assigned to 0; 
(b) if there are D 


deaths, then D is a binomial variable in 
n independent trials with probability $, whence ¢ may be 
estimated. 

The validity of these estimation equations does not neces- 
sarily depend on x being fixed, but does depend on the sampling 
distributions used being independent of any selection on m. 
For example, (a) will still hold if the number N of occurrences 
is the sum of a prescribed number d of deaths and the random 
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number of births in the consequent interval Ty, but not if, 
alternatively, Ty is restricted to be less than T, say. On the 
other hand, a switch in the ‘stop rule’ to a given number of 
deaths will affect the distribution of D|N. In (b) we must have 
n< N(0), to ensure that the rule does not break down through 
the occurrence of extinction, but more information is obtained 
if we take the number d — N(0). The sampling distribution in (b) 
is then changed to one of inverse sampling until there are d 
Occurrences with probability ø. The probability of N=n is 
now the probability ¢ of a death at the last occurrence x the 
probability of d—1 deaths in n—1 occurrences, i.e. 


PIN Lo Q-10)! 1—dyi-4ga >d), 31 
W-s|d-u-5mm-q A wed. (GU 
a type of negative binomial distribution. The log likelihood 
derivative in case (b) and for (31) has the same mathematical 
form, but a different sampling distribution. It may be written 
In the standard form in case (b) 


oL — n D 32) 
ee 4]. 


Showing that D[n is an unbiased optimum estimate of ¢ with 
variance J(1— $)[n, and in the alternative form 


aL _ PA PN ga 33 
mo Sl] diis 


for (81), showing that N/d provides an unbiased optimum 
estimate of $-1 with variance (1— $)/(9?d). The information on 
$ from (33) is I($-1)/d4=d/[62(1—9)], which for d=N(0) is 
Ereater than the information on ¢ in case (b), with »— N(0). 
The information on 0 by method (a) is n/0?, and is of course 
Breatest for n as large as possible, subject to the sampling 
requirements already indicated. 

e may illustrate this method of estimation by reference to 
the artificia] birth-and-death series shown in fig. 4, $3-4. Thus 
9T one of the three series shown the realization up to the 5th 
death is shown in detail in table 3. The true values of A and x 
in this case were unity, so that 0=2 and 1/¢=2. Consistently 
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with these values, we have 20 x 3:546 = 14-184(10) as a x? with 
16 degrees of freedom, and 


$4=8/5=1-6, o(f)=[(1—4)/5]}#/¢ — 0-63. 
Similar calculations for the other two series gave (i) 34-104(30) 
with 30 degrees of freedom and 931- 3-0 + 0-63, (ii) 34-220(30) 
with 28 degrees of freedom and $7 — 2-80 + 0-63. When 0 and ¢ 


are unknown, confidence intervals can alternatively be obtained 
for them from such data. 


Table 3 
Time Ti Birth (B) | Populati 
Event| interval "M or ENO) da N(t) AT 
(AT) death (D) 
1 0-208 0-208 D 5 1-040 
2 0-140 4 0-560 
0-348 D 
0-063 3 0-189 
3 0-411 D ; 
m 0-158 2 0-316 
k 0-569 B E 
0:240 3 0:720 
5 à 0-809 B 
6 0-030 4 0-120 
0-839 B 
7 0-041 5 0-205 
dmi 0-880 D s 
8 0-979 D 4 0-39 
3:546 


If independent replications of transitory or other evolutionary 
processes are available, then it has already been noted that 
particular features may be estimated by standard methods 
based on these replications. As an example of such a procedure, 
Bailey (1953) has discussed the estimation of the ‘recovery i 
infectivity ratio from observations of the total eventual number 
of persons infected in initial groups of susceptibles of given sizes, 
the model considered being of the simplest type with constant 
infectivity and ‘recovery’ (including transition to non-infectivity 


by removal, etc.) rates, with no immigration of new susceptibles 
(cf. $4-4). 
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Chapter 9 


CORRELATION ANALYSIS OF TIME-SERIES 


91 Correlation and regression analysis of stationary 
sequences 


The theory of stationary processes developed in Chapter 6 
throws a powerful light on the possibilities of analysing time- 
Series, Correlation analysis and harmonic analysis, often treated 
as two distinct methods, are seen to be theoretically equivalent 
if correctly interpreted. On the other hand, actual time-series 
of a stationary type will not necessarily possess the simple 
harmonie structure assumed in classical periodogram analysis; 
they very often have a continuous, not a discrete, spectrum. Thus 
the statistical analysis of time-series is logically greatly helped 
by the general mathematical theory; but the latter also raises 
many more problems connected with sampling fluctuations. 

In some physical processes, the extent of the series available 
for Study may be as much as desired, and the measurement of 
Correlation coefficients accurately carried out to the limits of 
experimenta] error by means of the ergodic relation between 

Phase’ (probability) and time averages. This seems true, for 
example, in turbulence measurements. In statistical time-series 
Mm other fields (especially in economics) the length of series 
available for study is often severely limited. In all cases, how- 
ever, it is important to bear in mind the magnitude of the 
Sampling errors. This is perhaps most strikingly illustrated in 
© case of harmonic analysis, where any direct attempt to 
estimate the spectral function may give rise to sampling 
Uctuations that do not diminish with the length of series taken. 
this last chapter we confine our attention mainly to the 
correlational or equivalent harmonic structure of real stationary 
Me-series, In spite of some exact results, the sampling theory 
of time-series, like that of stochastic processes in general, is 
still in a comparatively early stage of development, and in 
Our discussion we shall be concerned mostly with presenting 
Some of the more important ‘large-sample’ results. We do not 


254 CORRELATION ANALYSIS OF TIME-SERIES 9:1 


consider series which are not stationary; if a series has a syste- 
matic trend this can be removed, but if it is essentially of the 
evolutionary type, then a rather well-specified model is likely to 
be required for a profitable analysis to be possible. The sampling 
theory of this chapter is of course relevant to the prediction 
theory of Chapter 7 if the correlation structure of a series is not 
known a priori, but has to be inferred from observation. 

We shall consider first stationary sequences. This may appear 
somewhat in contrast with Chapters 6 and 7, where the theory of 
continuous series was developed immediately, but the obser- 
vational data, in statistical applications at least, are usually of 
the sequence type, and moreover, as we shall see, a specification 
for a continuous series may sometimes be conveniently tested by 
means of the sampling theory for sequences. 

Sampling fluctuations of means and correlation coefficients. The 
sampling properties of means and correlation coefficients 
obtained from a series of » observations Ky, X3 s.s X, (corre- 
sponding to time values t = 1,2,...,2) may to a considerable 
extent be investigated by straightforward algebra. Thus for the 
sample mean defined as 


X--YXx, a) 


we obviously have E{X} = E{X}, which will usually be put zero. 
We have further 


2 = 
Qt Y ll, (2) 
s=—n+1 
so that the variance of X is asymptotically ~o?/n, where 
#= lim S (-E!), (3) 
n>o s=—n+1 n - 

is assumed finite (from the theory of Chapter 6 this will not be 
true if the spectral function F(w) has a non-zero jump at w=0; 
or in more practical terminology, it is assumed that X, has no 
component which, while zero on the average, remains constant 
for any single realization z). 

Consider next the covariance and correlation functions, If 
we know #|X}=0, we may define the sample covariances by 


] ^-s 


€-—— X XXe (820). (4) 
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In this case Æ{C}=p,0?. If we do not know the value of the 
true mean Z(X], we may replace X, and X,,, in (4) by X,- Xi 
and X,,,—X,,,, where X, is the mean of the X, occurring in (4). 
In this case we obtain a mean value for C, of 


2 vov fad 2 
PaT? — E(Xi Xn ~ (E Gg (5) 


We shall here record only the dominant term in the formulae 
for the variances and covariances of C, (s — 0, 1, ...), and to this 
order of approximation the effect on random sampling errors of 
measuring X, from the sample mean can be neglected. The bias 
indicated by (5) is also of a smaller order of magnitude than 
sampling fluctuations, which are O(1/,/n), but being a systematic 
effect is sometimes worth correcting. 
From (4) 
n-s-i-1 VIG 
(Gus UU D- em eo, 
1 —S8ys-(n-s)41 n—s—t 
where 
v (v>0), 
a(v) = 0 (-t<v<0), 


—v-t (—(n—s)+1<v<—?) 
and where ( , 


p(w) = (Po Post + DosssiPo-s) +Ky sb 
Fait denoting the fourth-order cumulant between X,, Xu 
uty 9nd X44, (It should be recalled that 


E(x, Keta 3X OEO. AR, = E(X,, Xa E{X uro X uror 


9 E(x, X» E(X,,, Xu "b EX, Xu) E{Xuss Xur) 
F +Ky, ste) 
or largo n, we have approximately 
I o 
cov (Cp Ondoz E P (6) 


When this sum converges (the more exact sum involving $(v) 
x always converge under specifiable conditions, as for # in 
(3), Obtainable by treating C, as a mean of a new stochastic 
Process E XX i 
noting the sample covariance of the observations X;, X41, 
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...,K, more fully by C,(i,7), we define the sample correlation 
coefficient R, by 

R,=C,(1, 2)/[Cy(1, n — s) Cs + 1, n)]2. (7) 
Noting that R,~C,/Cy, we obtain from the formula 


U V 
cov (m. v) ~ 
cov(U,V) ucov(V,W) veov(U, W) wyar (W) 
w? - p 3 wi ws 
which holds when the variation of U, V and W about u, v and w 
(w> 0) respectively becomes small, 


i o 
coy (Bo Tau) RS n—s x (PvP + PossstPo-s + 2p. p. ps 
v=- 


T7 P/N Nn a 2P541PvPv-s)s (8) 
if the component depending on the cumulant term K,,,, may be 
neglected. This component is necessarily zero for normal pro- 


cesses. However, it should also be observed that for any linear 
process of the type 


X 2 Tiv Yus 
v=0 


where the Y, are independent disturbances with the same dis- 
tribution, we have (cf. equation (16), § 5-2) 


ao 
Ky, at = K,( at E, IuIursIurvIurvsest (Gu = 0, uc 0), 


© E T 
sinit v D Ky, s (7 Kal Y) X gu uss p IwI west 
EE u--—o0 w--—o 


-9 Y) cov (X,, XL) cov (X Kurser) 
where y(Y)=«,(¥)/o4(Y). This result will be found to ensure 


that for such processes also the neglected K, sı term in (8) 
vanishes automatically. ji 


As special cases of (8) we have, when D, 0 as s increases, the 


formulae 
var(R)~ | y p R 1 $ 
ct CL Ba) SL PoP 
pm 
BAR EE 
x Ps 
v=-@ 
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(It is assumed in (9) that Yp, ,,p,.., is small when p, is small.) The 
formulae (8) and (9) are useful in indicating the magnitude and 
correlations of sampling fluctuations in the observed correlogram 
(graph of the observed correlation coefficients), for processes 
with true correlograms of the ‘damped’ type (p,— 0 as s in- 
creases), While, however, the sampling fluctuations in R, are 
of O(1//n) and decrease with increase of x, they have the dis- 
advantage, when the true values p, are not known, of depending 
on these unknown values. This stresses the difficulty of purely 
empirical correlation studies with time-series. 1t seems desirable 
to have some theoretical model in mind for the structure of the 
Series, depending on only a few unknown parameters; these can 
then be estimated, and the theoretical correlogram corresponding 
to the fitted model then compared with the observed correlogram 
in the light of the above sampling errors. 

Time-series composed of exact harmonic oscillations are more 
naturally dealt with by harmonie analysis; although if the 
Correlogram is constructed, it will also exhibit corresponding 
undamped oscillations. The correlogram analysis is more useful 
for oscillatory series which do not consist of simple undisturbed 
harmonic oscillations and exhibit a damped correlogram. An 
important method of analysis of such series is the use of the 
linear autoregressive model first introduced by Yule. This is 
a very general model for non-deterministic series if unlimited 
in extent, and for uncorrelated residuals (cf. 86:3); a much 
narrower practical hypothesis consists in assuming (i) that a 


few terms (p, say) of the series 
X,-YX,2a, X, +. Xia t -e (10) 


are sufficient, where (ii) the Y, are independent residuals. The 
leno a coefficients in such a model may be estimated by 
east squares, i.e. from the equations obtained by minimizing 


n-p n-p xy 
2 Fhe X Enp Aaga 72» DE 
- [S1 


The simplest model of type (10) which provides a quasi-periodic 
°Scillatory series is the case p = 2, i.e. 


X,=4, Xia d d Xia t Ye (11) 
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Multiplying (11) by X, , and averaging, we obtain (cf. equation 
5:2 
Sl Ps=01Ps-1+42Ps-2 (S> 0), (12) 
with in particular (s — 1) 
p, -as/(1 —a5). (13) 
The solution of (12) is, for — 1 < a, < 0 and a? + 4a, < 0, 
_(- a5)? sin (sO + yr) 
ae ^ NENGS. 
1—a. 

h -2% ata 
where cos 0 NEN tany ia 
Formula (14), or more simply formulae (12) and (13), may be 
used to calculate p, when a, and a, are known. The least-squares 
estimates @, and @, are given by 


@,2X7,,+4,2X)4.X,= ZX 2X1 
â XX, X ü,EX2- DX aX), 


or approximately, by substituting R, for p, in (12) for s=1 
and 2, 


2 
SAT (15) 


We cannot quote the usual standard error formulae for least- 
squares estimates without further justification, since the X's 
play simultaneously the roles of dependent and independent 
variables (in the regression sense). Mann and Wald (1943) have, 
however, shown that these least-squares error formulae are still 
asymptotically valid for large samples. A simple (unrigorized) 
demonstration in the case of (11) is given below. 

Writing ZX,,,Y,,—4, EX,Y,,— B, and then adding these 
equations, after substituting for Y,,, in terms of X, from (11), 
to the exact least-squares equations given above, we obtain 


6a, UX}, +a EX, Xy =A, 
ôa EX 3,444 EX? =B, 
where ôa, 2d, —a,, 0a, 0, —a,. Using such results as 


EQ XY XU-0 
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unless u — (e.g. if u >t, 


EQ X,Yuy Xu} E EY} E(Y,, 1 X,X,} =0), 
we have 
ELA} ~ E(B} e no*Y)oNX), E{AB}~ npo?) o*(X), 
o*(Y) 


whence o%(4,) ~ Â) 53 aX) 


Squaring and averaging both sides of (11), we obtain 


o*«Y) (1+0) (a-a) am 
o(X) l-a : 
whence reâ) ~ oÂ) (1 —a)/n. (17) 


Least-squares estimation has been considered here as providing 
efficient ‘linear’ methods of estimation, using the term ‘linear’ 
in the sense that the deviations of the estimates from the true 
coefficients are linear in the residuals Y; and while the variables 
X, are not independent of Y, as in the classical regression or least- 
Squares model, the above results indicate that asymptotically 
this does not affect the error formulae. If exact information were 
available on the distribution of the residuals Y, this could be used 
to provide fully efficient estimates based on the maximum- 
likelihood method (see $8:3); in the important case when the 
Y, are not only independent but normal (if they are uncorrelated 
and normal they are necessarily independent), the methods are 
asymptotically equivalent, owing to the form of the log likelihood 
function 

n—p 


1 
~ ———-logo?- 55 


Y? constant, (18) 
2 207 t=p+1 


Where g2— oY). 


9-11 Goodness of fit tests. In conformity with the 
Principles of $ 8-1, an asymptotic goodness of fit test of an auto- 
regressive series of low order p could be made within the class of 
autoregressive series of some higher order q, say, if full specifica- 


tion of the form of the likelihood function is available. Thus on 


€ assumption of normal residuals we readily obtain from (18) 


(with neglect of end corrections of relative order 1 [n) 


g^ Ei- Ballo" (1) 
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where 2 denotes the sum of squares of residuals after fitting a 
series of order q, and Xj after fitting one of order p (the dash 
denotes that Y, has been made comparable with ÈX, for example, 
by taking only the last n—q residuals). From the asymptotic 
properties of the fitted coefficients, which include their asymp- 
totic normality (Mann and Wald, 1943), the classical asymptotic 
likelihood theory applies, and the approximate xy? quantity in 
(1) has q— p degrees of freedom. 

If o? is also not known, a logarithmic form is obtained, 
asymptotically equivalent to 


X~ — (n.— q)log (Z,JY:). (2) 


The approximate form (2) has the advantage that it depends only 
on variance ratios, that is, on correlations, and preserves its 
asymptotic distributional properties under the wider assumption 
of independent but not necessarily normal residuals. This follows 
from the asymptotic joint normal properties of the serial correla- 
tions R, established by Mann and Wald, whose assumptions 


were of this wider form (finite moments of all orders for Y, are 
assumed). 


The use of x? tests of the above type based on the likelihood 


criterion has been advocated by Whittle. While derived on the 


as (2) to this assumption all 


ows their more general use. One 
practical disadvantage appea: 


ts to be the explicit formulation of 
the alternative class of hypothesis, for although we have seen 
that this class for fairly large q includes a wide class of non- 
deterministic time-series, it implies that the autoregressive 
coefficients have in effect to be estimated not only for the model 
the alternative of higher order q, the 


A neous linear equations with associated 
determinants of order q. 


In view of this, it will sometimes be more convenient to make 


of fit test due to Quenouille and 


formulated purely in terms of the assumed autoregressive model 
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where this equation is of autoregressive type (i.e. it can be put 
in the form X,=G,X,+Y,, where G, is a ‘backward’ operator). 
We consider the properties of H,C,, where this notation means 
that the operator H, is to act on the X’s occurring in the co- 
variance C,, We have 


n—t 
H(HO)-E|.— € X«al-0 (0) 
—'usl 


1l n—tn—r 
EH C, | B ee À X,Y, 
HGH, Ode Bl — ac SE eee] 


= E{X Xu E( r5 (O<7< i), 


=X) OXY) px (4) 


Now from (3), multiplying by H, X, and averaging, we have 
H, Ep, 0X) = 6,,07(Y). This result, together with (4), enables 
us to choose a linear combination of C, which is uncorrelated 
With the corresponding function of C, (r +t). Since 


H, Hip, = Ha F_, Ptr 


we have, in fact, a choice between H?C,(t>p) and H_,H,C;(t> 0). 
The latter choice has the attraction of a simple correspondence 
With the covariances of the Y’s, since 


]p ne ] at 
nae I uou EF XB (r— 0), 
~H, AC, (T=0), 

= H_,H,C,. 


However, the alternative choice of H?C, which was the form 
Proposed by Quenouille, appears somewhat more satisfactory, 
at least in the usual case when unknown parameters are esti- 
mated. Thus if we consider the asymptotic distribution of 
(5 — t 220, (t> p), where h, denotes the operator H, after sub- 
Stitution of our estimates of the unknown parameters, we find 
for the difference 

(n—t)t (4 — Hj) C,= (n— 0) Qu — Hj) (In + H) C, 
~ (n—t)5 ( — A) (2H,C)) 
~ (n—t)? (4 — H) (2Hp) e*(X) 
=0 (ip) 
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where ~ denotes asymptotic equivalence, and it is assumed that 
the errors in the coefficients are O{(n—t)-#}, the identity 
H,p,., —0 (£27) following from (3) after multiplication by X,. 
. On the other hand, 


(n—0) (A, — H. ,H)C, 
= (n~t)? (h_phy— H_yhy + H 4h — HH) C, 
7 (n—t) (h. ,— Hi) y+ (n— t)? Hilh H) €, 
~ (n—1) (h_y— H) Hp,o*(X)4- (n— t)? Hilh- Hi) C, 
=0+(n—t)} H (hy—H)C, (t>0), 


which is not zero asymptotically. This establishes the asymptotic 
equivalence of (n—t)#h?C, with (n—t)?H?C,, but not of 
(n—t)th_yh,C, with (n—t)! H. HO, 

We have also from (4), if we denote H,C, by D, 


(HDD) = X) 0%(¥) Hp =0 (57. 


Thus the functions H, D,= H?C, are also uncorrelated with D, 
(r<t). Dividing through by C,~o*(X), we obtain finally the 
approximate result 
2 ô., o4(¥) 
E(H1R,H* Rye —r. 3A 4. 5 
(HER, HR) ~ Per Ts (5) 
and since the functions H? R, tend to normality as » increases, 
they should be approximately independent normal variables. 
In the case of the process (11) of 89-1 we obtain H? R, as 
Ry— 2a, E, , (a5 — 225) Ry_g + 2a, a Ry. +a} Fi 4, 
to be calculated, when R, and R, are used for estimating a, and 
Qy for t=3, 4, .... 
In the case of processes for which H, does not represent à 
finite autoregressive form, use of the above method leads to 


some difficulties. For example, in the case of ‘ moving averages’ 


where we have X, uei (6) 


ew 


where J, is of finite length, the formal inversion of this equation 
leads to an operator H,z J;'! of infinite length. This case can be 
dealt with by a method developed by Wold (1949), or alter- 
natively by extending the above method to correlated residuals. 
This latter method has the advantage of covering also those 
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cases of time-series specified for continuous time (see § 9-12) 

which are expressible in this way. 

Mee ae this latter method, we assume now that in 

e res (3) Y, is m-dependent’ (i.e. correlated to a length m). 

rod e shown by similar methods that H, D,,,, and H_,D, then 
zero expectation for t >m, and also 


E(HiR,,H? Ry 4p} ~ E(H H, RHH, R} 


"m oY) S 56 

n—T oi X) sm. iVi--7? (7) 
weer À denotes the autocorrelation of lagi of Y,. Equation (7) 
ES 3 general for t2 T » 2m, and may often be assumed true 
" Abd if the process X, is a discrete ‘linear process' or 
oues al) From (7) it will be seen that we arrive at 2m+1 
i ara such that each term has zero mean and any two terms 
a sequence are uncorrelated. This leads to a total x? for 
pete none In practice there is some disadvantage in this 
ve ae of partial tests; a simple though rather rough method 
anb ud amog them is to sum all the x?’s as if they were indepen- 
red d then reduce the total X? and its apparent degrees of 

om (the total number of individual items) by the factor 


TD. 


1/ X o" (8) 

i--o 
Where the 3j; are the correlations between the individual com- 
Ponents before squaring. A more exact test would involve the 


inversion of the matrix with (¢,7)th element 


m 
E Obir 
T 2. i--m 
e is of Laurent type, and an iterative procedure used by Wold 
9) for finding the inverse of such matrices is available if 
required, 
"ns. above extension of the x? test to m-dependent residuals 
a eig on the correlation forms still being asymptotically 
‘ Trmal, but this is covered by theorems due to Diananda (1953) 
pe Walker (1954). 
ün Wing autoregressive constant 
examine the effect of fitting one 


s successively. It is instructive 
further coefficient to an 
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autoregressive scheme (cf. Lee, 1951). This may be done quite 
simply, and indicates the relevance of the expressions 77,5 
(£— 1, 2, ...) for testing significance. 

The least-squares equations for the scheme 


HX X,-a, X, 4,—-...—a, X, ,—Y, (9) 

are hyr,=0 (t=1,...,p). (10) 
We note also that 

An [*(Y)Jo*(X)]s, (t=). (11) 

Equation (11), together with (10), may equivalently be written 

hyn-be*«Yyge*«X)ha. (t= 0). (12) 


Under the new hypothesis 
HX =X,- X- -ah Xi, 4-Y, (13) 
the equations (10) are replaced by 
hin=0 (t=1,...,p+1). (14) 
Operating on the set (14) by h, and making use of (10), we obtain 
[LO = (7) sa =0, 


or = (ut aps1 , (18) 
LAM 


From (10), equation (15) is equivalently written 
2 
ee ds 
From the previous results we have 
EQ) sa) ^0, 
F(r)tL sa ~ c LY Jof (XT, 
whence, under the hypothesis (9), we have 
Bpo, oii, a7) 


a result which permits a simple asymptotic test of the signifi- 
cance of a,,,. If it is significant, the adjusted values aj (i 1,... p) 
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may be calculated from @; by a standard regression formula, 
which becomes in this context 


&-8,-à ud, ua (—12,...,p). (18) 


More generally, it has been shown that the x? test using the g — p 
forms hit,» (6=1,2,...,9—p) is asymptotically equivalent to 
the likelihood criterion developed at the beginning of this section 
for comparing the hypothesis of an autoregressive model of 
order p with the wider hypothesis of one of higher order g (for 
a more detailed discussion, see Walker (1952)). 

Superposed error. Complications in fitting autoregressive 
schemes may arise in practice from additional errors; forexample, 
from a random superposed error depressing the observed 
correlations. Thus if U=X,+ Zo (19) 


where Z, is an independent sequence independent also of X, 

wW 
etara o%(U)=0%(X)+0%(Z), (20) 
cov (Up Urq) = 00V (Xp Xu) (7> 0). (21) 


Writing A= c*(Z)[a*(X), we have a depression of the correlations 
by a factor 1/(1+A). If this is suspected, an estimate of the 
quantity A will also be necessary, and correspondingly for other 
error effects, Tt might be noticed that the series U, is of 'p- 
dependent, residual’ type if X, is autoregressive of order p. The 
equations H,p,— 0 (t» 0) for X, are thus satisfied for U, for t> p; 
this is sometimes useful in providing simple consistent (though 
Not fully efficient) equations of estimation for the original 
coefficients a, even in the presence of superposed error. 


,9'12. Time-series specified for continuous time. Many 
time-series are not of the type considered in the last section, but 
are recorded continuously (e.g. physical processes recorded by 
electrical or optical means; see, for example, fig. 6, §6-1). Even 
“& continuous record is not available, it is sometimes more 
reasonable to envisage a process of continuous type from which 

‘crete observations are made than a process only definable for 

IScrete time, . 

Wo general remarks are relevant. First, if a discrete series 
$ Observations is used in place of a continuous record, the 
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general sampling error formulae for the correlations given in the 
last section will still be valid. Secondly, if the continuous record 
is considered directly, the adaptation of the formulae is im- 
mediate. For example, the sampling variance and covariances of 


1 P : 1 
Cin - a |, X(t) X(t4- 7) dt (1) 


are exactly as in the case of discrete sums, with integrals Te 
placing sums and T replacing n; thus the formula corresponding 
to equation (6) of §9-1 is 


cov (C(s), O(s +t)) 
npa f epet pto s) ptos] af 


from which other formulae may be deduced. The K, ,, term will 
be zero for normal processes, and its effect on the leading term 
of the corresponding formulae for the correlation coefficients 
R(T) will also vanish for linear processes of the type 


Xt) f gta) (g(u) =0 for u « 0), (3) 


where Y (v) is an additive process. Apart from the K, ,, term, an 
equivalent formula to (2) (which assumes that the correlogram 
is damped and hence that the spectrum is continuous) is obtained 
with the use of Parseval's theorem and is 


cov (C(s), C(s +t)) ~ zr]. f2(w) [eit + ci] do. (4) 


Regression analysis of these continuous processes, however, 
raises complications, for such processes are in practice likely to 
arise from differential equations, in contrast with the difference 
equations with discrete time, so that simple autoregressive 
models of the discrete type are not in general appropriate except 
perhaps as approximations. Methods of fitting models more 
consistent with the character of these processes can in principle 
be worked out, and will be illustrated by consideration of the 
model analogous to equation (11) of § 9-1 (cf. § 5-2, equation (19)) 


dX (t)+aX(t)dt+PX(t)dt=dY(t). (5) 
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To estimate the unknown coefficients « and J, we minimize 
the sum of squares 


T T 
f [dY ()]? =Í [dX (t) + e X (t) dt + X (t) dt}, 
0 
obtaining the formal solution 


fixo [dX (t) + aX (t) dt + BX (t) dt] — 0 | 
: | (6) 

i Xiyté() eade X()aneo. 
0 


Since [Lo xt ac- nace, 
0 
[xoxo -[X(t) XOR- f XO, 
0 0 


the solutions of (6) are asymptotically equivalent to the estimates 


T T . 
à- - [ toato I \) X*()di, 


Jo (7) 
A Re T 
8-[ xoa | X*(0 dt, 
0 0 
and their asymptotic errors are 
-" i 2 25 Ky 2af 
P(Ê) ~ — Ke rs us 03 yes a Up 
Jio |, oa (8) 


cov (&, 8) - 0, 


Where Ky is the rate of increase of the variance of Y (t). 

The above estimation equations are simplified by the ortho- 
Bonality of X(t) and X(t), implying an asymptotic orthogonality 
or these quantities in the sample. The estimate a remains the 
Same if @=0, when (5) reduces to a linear Markov equation in 
U=). For such a process we thus have the least-squares 


estimate T 
in - f'vwavo]J, U%t) dt, 
0 


with asymptotic standard error «/(2a/T) (cf. equation (15), § 8:3). 
turning to the second-order process (5), we recall that the 
least-squares equations for a and b in the discrete time case were 
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asymptotically equivalent to the first two equations (s=1 
and 2) obtained from the difference relation (12) of § 9-1 by sub- 
stituting R for p. Similarly, the above estimates correspond to 
the differential equation 


p" ()--ap'(r)- Bp(r) - 0. (T>0) (9) 


(obtained by multiplying equation (5) by X (t — 7) and averaging), 
evaluated at 7 — 0, and to its derivative, also evaluated at 7 — 0. 

If the direct evaluation of these estimates is not practically 
convenient, alternative methods making use of the observed 
autocorrelations may be used, but the above limiting formulae 
are still useful in suggesting methods approximating to optimum 
(least-squares) efficiency, and their validity is incidentally 
checked by such further investigation. Thus the most direct 
alternative is to obtain c and £ by interpolatory methods from 
the correlations R(h) and R(2h), using the known relation 
(equivalent to equation (28) § 5-2) 


e~tI7I cos (A | r| - 0) 
cosÜ ^" 
where tan8 — }a/A, A (8 — 122) (A> 1o). It is useful to note 


from (9), or from the solution of (5) given in equation (21), $5:2, 
that p(T) also satisfies the difference equation 


p(r)- (10) 


P(T+2h)+ap(r-+h)+bp(r)=0 (r2 0), (11) 
where a= —24¢-th cog Ah, —  — g-ah, (12) 


This relation is convenient for computing the theoretical cor- 
relogram when g and £ have been estimated; and may sometimes 
be useful in providing more rapid (though somewhat less efficient) 
estimates than from R(h) and R(2A) in conjunction with equation 
(10). Moreover, such difference equations, for 7» 0, are still 
valid even in the presence of superposed random error (cf. end 
of § 9-11). 

As noted in the preceding section, a method of testing the 
goodness of fit of a theoretical correlogram of the above type 
makes use of the difference equation for X(t). For example, from 
the solution for the process (5) it is easily verified that 


Xue taX u DX Yu (13) 
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where a and b are given by (12), but Y, while independent of 
¥,,, (T> 1) is correlated with Y,,,. Thus in the notation of §9-11 
for dependent residuals m=1 (in general, for a pth order linear 
differential equation, m — p — 1). We have further 


g*(Y) - E((X, c aX, S 0X) Y) 
= E{(X,+aX,_4+ XU) (X,+aX4)} 
—o*(X) (1r a? 2ap, + bp, * abp;] 
—g*(X) {1 +a?—b? 4- 2api), 
E(YYX, 4) = E(Xi aX a + 0X12) Y.) 
—E(X(Xu4 +aX,+ 0X,4)} 
=0°(X) {pı + bp, + a). 
With these results we find 


(1 +a?— b° + 2ap,)? + 2(a+pirt ui 


(t=7); 
oY) m 2(a +p, +bp,) (1 +a?—b? 2ap;) "— 
e*(X) PEZ Ser = ; 

(a+p,+ bp)? (£742), 
0 (t5 74-2). 


The first of these expressions /, say, leads to the asymptotic 
variances of the functions (n—1)? H2 js (or (n—10)* H.,H,Ej) 
and correspondingly the conversion factor from (Hi Riz)? to a 
X? component is (n— d)/u. The other quantities are needed either 
for testing roughly the significance of the total x? by use of the 
factor (8) of 89-11, or if a more exact matrix inversion is to be 
Carried out. The process (13) is not a discrete linear process in 
the precise sense previously defined, and the last results are only 
true in general for £z 7 > 2. 

9313 Numerical examples. An artificial autoregressive 
Series. It is useful to illustrate these methods first on a series of 


known type, so that the agreement with theory may be indicated. 
€ consider an artificial series of 480 terms of the form 


X,-14X, 47 05X, 4T Q) 
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constructed by M. G. Kendall (1946); the Y, were taken directly 
from a table of random numbers, and were thus not normal. 
The first fifteen correlations, theoretical and observed, are given 
in table 4 and fig. 10. From R, and R, we calculate @, = 1:132 


(S.E. 0-039), @,= — 0-485 (S.E. 0-039), consistent with the values 
1-1 and — 0-5. 


Theoretical 


Observed 


05 


025 


—025 


Fig. 10. "Theoretical and observed correlograms of 
M. G. Kendall's artificial series I, 


The goodness of fit of this series was demonstrated by Quen- 
ouille (1947) using the known values of a, and as. In practice 
we should use @, and â., but this would not appreciably affect 
the results, so we quote the values of HÈR, (s—1,..., 15), and 
the corresponding cumulative sum of squares of the derived 
standardized variates, which form a y? with degrees of freedom 
equal to the number of values taken (for statistically significant 
values of x?, see Fisher and Yates, Statistical Tables, 1938). 

Wolfer's sunspot numbers. "This series was the first to be 
analysed by the autoregressive method in Yule's pioneering 
paper (1927), and for comparative purposes we have used the 
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Table 4. Correlogram and goodness of fit 
(M. G. Kendall's artificial series I) 


g Ps R, HER.» x 
= 1 
1 0:7333 0-762 0-02412 2-32 
2 0-3067 0-377 0-00417 0-07 
3 — 0-0293 0-079 0-01979 1:56 
4 — 0:1856 — 0-067 —0-00812 0-26 
5 — 0:1895 — 0:078 —0-00013 0-00 
6 — 0:1156 — 0-039 0-02026 1:62 
7 — 0-0325 — 0:007 — 0:02247 1-98 
8 0-0221 0-022 — 0:02903 3:31 
| 9 0-0406 0-018 —0-01047 0-43 
| 10 0:0336 — 0-036 —0:01226 0:59 
u 0-0166 — 0-103 0-00747 0:22 
12 0-0015 — 0-145 0-01345 0-70 
13 — 0-0067 — 0-128 —0:00573 0-13 
14 — 0-0081 — 0-052 — 0-00517 0-10 
15 — 0:0056 0-029 0-00249 0-03 
13:32 


graduated series} for the years 1749-1924 given by Yule. The 
Observed values of R, for s— 1, ...,23 are given in table 5 and 
fig. 11 (there are some unimportant small discrepancies between 
these values and Yule's for s=1,...,5, possibly due to slight 
differences in the formulae or methods of computation used). 
Yule fitted a simple second-order autoregressive model from R, 
and R,, obtaining @, = 1-5153, d; — — 0:8025, and the first 20 x? 
values obtained from 42 R,,, for this model are also given in the 
table. The total x2, with 20 degrees of freedom, is obviously 
Significant (due mainly to the group of high x? values for low 8). 
It may be argued that a more consistent model of autoregressive 
type for these numbers, which represent continuous sunspot 
activity, would be the continuous process of equation (5), 
$9-12. This yields estimates from R, and R, of &=0-3186, 
P — 0-3631. From equation (12), $9-12, the corresponding values 
Of a and b are — 1-4255 and 0-7272. This second model appears 

Ì The graduated series was introduced by Yule in an attempt to reduce the 
effect of any superposed error. Its use here in preference to the primary series 
may be criticized, but the R, in table 5 were originally worked out for this 
Series in connection with the study of the continuous model, for which they 
showed moro promise for s <5 (see Bartlett, 1946). They were thereafter made 
use of for these various further illustrative calculations. 
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to give slightly better agreement over the first five correlations, 

but it seems clear that any gain in the goodness of fit near the 

beginning of the correlogram is counterbalanced by a loss for 

the later correlations. It must be remembered that an observed 

correlogram always exhibits less damping than the theoretical, 

as the observed correlations are inflated by sampling fluctuations 
10 


R; Observed — 


075 Theoretical (a) .------ 
05 


025 


—025 


=05 


Fig. 11. Observed correlogram of W. 
series, 1749-1924), compared with thi 
autoregressive model (a) discrete time, 


olfer's sunspot numbers (Yule’s graduated 


eoretical correlogram of second-order 
(b) continuous time. 


(in this case, for the continuous model fitted, the standard error 


of a correlation has an asymptotic value for large lag of about 
0-16); 


nevertheless, the damping for this second theoretical 
correlogram appears excessive without detailed test. However, 
to illustrate the application of the goodness of fit test for the 
continuous model, the quantities ARR, 


‘+2 appropriate to this 
model are also shown in table 5 with the corresponding x” 


quantities. (No x? quantity is given for s = 1, as we have seen that 
this would require separate consideration, We have, however, 
included s=2, assuming that the effect of any non-zero fourth 
cumulant of the distribution of Y, is negligible.) Making use of 
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the rough evaluation of significance suggested for this type of 
application to dependent residuals, we obtain a reduced total 
X? of 42-85 with 20/1-40— 14-3 14 degrees of freedom, again 
highly significant. 
Table 5. Correlogram and goodness of fit for Wolfer's sunspot 
numbers (Y ule's graduated series, 1749-1924) 


Discrete Continuous model 
8 R autoregressive model 
s 

I Ps I Bs a Ps h Reso a 
1! ossos] (0.8407) 000765 092| (08407) 0.00054) — 
2| O4715| (04714) 003138 | 15-20| (0-4713) 0.03084] 12.60 
3 0:0470 0-0397 | —0-02571 | 10:18 0-0605 | —0-02026 | 5:40 
4| —0-2647 | —0-3181 0-02704 | 11-19 | —0-2565 0.02613 | 8-90 
5| —0-4059 | — 0-5139 0-01791| 4:90 | — 0:4096 0.02151, 6-02 
6| —0-3601 | —0-5234 0-016079 | 427| —0:3974 0.02127 | 5:85 
7| —0-1638| —0-3807 | 0-01085| 177| —0-2687|  0-01594| 3:21 
8 0-1088 | — 0:1569 0-00538 | 0-43 | —0-0941 0.00832 | 0-89 
9 0-3646 0-0678 0-00450 | 0:31 0-0612 0-004750 | 0:25 


10| o5197| 0-2286| 0-006043 | 0-60] 0-1557) 000394 0-20 
u| 0-5273} 0-2920) 001402) 289| O-1775|  0-01041 1:36 
12| 0.3937, 0:2590| 0-00025| 0-57| 0-1398|  0:00465 0-25 
13| o.1791| 0-1582| 000425| 0-26] 0:0702]  0:00354 0-16 
T ~0-0390| 0.0318 | —0-01467| 3-10 | — 0-0016 | —0:01460 | 2:58 
1e ~0-1947 | —0-0787 |  0-00241| 0-09 | — 0:0533 | — 0-00008 0-00 
17 ~0-2706| —0-1448 | —0-01200 | 2-05 | — 0:0748 | —0-01340 2-19 
1g | 7 0°2683 | —0-1503 0.02077| 6-09 | —0-0679| 0-01825} 3:98 
19 | 02179! — 0-1206 9.00068 | 0-01 | —0:0424| 0-00193 | 0-08 
2 —0-1256| —0-0573| 0-01051| 1-55] —0-0111 0-01364 
0| —0.0096| 0-0099| —0-01519| 318|  0:0150 — 0-010608 
0-1143|  0-0610 — | 00295  —0-01499| 2:59 
0.2035 | 0-0845 69-62} 9.0311 
0-2132| 0-0791 0-0229 


eS 


pt 


These two examples are discussed further in §9-2 from the 


Point of view of harmonic (periodogram) analysis. 
The emigration-immigration process. For this process we had 


t : è 
he linear regression equation 
X,-m-e (Xy m) * Y, 
Where the successive Y, were necessarily uncorrelated, but only 


normal and hence independent for large m. The series referred 
to below, for which we are indebted to Lord Rothschild (cf. § 8:3), 
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all had means of over 12, and the normal approximation should 
for such series be sufficient, but a specific theoretical investiga- 
tion for this process by Miss V. T. Patil has shown that the re- 
quired asymptotic moment properties of the Quenouille forms are 
approximately preserved even for finite m. The asymptotic nor- 
mality of the correlations should also be preserved in general owing 
to their relation to the transition frequencies, the asymptotic 
normality of which would still hold for probability chains of this 
type, even although the number of states is not strictly finite. 


Table 6. Means and correlograms (Rothschild’s data) 


Series 1 Scries 2 Series 3 | Series 4 Series 5 
n 112 112 113 117 114 
R 24-99 16-05 15-89 12-89 20:77 
R, 0-5426 0:3755 0-5606 0-4038 0:4313 
[A 0-3109 0-1013 0-3280 0-2055 0:1976 
R, 0-1976 0:0784 0.2026 | —0-0041 0-1617 
R, 0-0558 0-1050 0-1506 — 0:1248 — 0:0033 | 
R; 0-0560 0-0258 0-1235 — 0-1107 — 0:0520 
R, | — 0-0608 0-0206 0-1096 | — 0-0043 0:0024 

iyi, 


The relevant numerical results, taken as far as Rg, are shown 
in table 6. The total y?'s for the various series, using R, as an 
estimate of p, in each case, are 4-09, 1-56, 1-09, 4-13, 3°27; 
each with 5 degrees of freedom, or 14-14 in all with 25 degrees 
of freedom. In particular, the R,’s for the five series are satis- 
factory, and thus no appreciable evidence of any inadequacy = 
the model is indicated, (The five series have been tested sepat- 
E ely. In fact nos. 1 and 5 refer to two series of counts in the same 
size frame, and nos. 2, 3 and 4 to a second size. A test of homo- 
geneity of R, and f within these two groups, based on the 
asymptotic sampling variances of these quantities, reveals nO 
heterogeneity for p,, but some for m. For further tests on thes? 
data, see also, however, Patil, 1957.) 


9:2 Harmonic (periodogram) analysis 


It has already been pointed out at the beginning of this 
chapter that the classical approach to harmonic analysis and the 
corresponding search for periodicities in observed series by mean? 
of ‘periodogram analysis’ become merely one aspect of the 
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analysis of stationary processes, and that only when regarded 
from this broader basis is a sound interpretation of observational 
results likely. 

The classical analysis of a discrete (equally-spaced) series of 
observations consists of computing, usually for integral p, the 
quantity ""u 
J,2 A, iB, 7, X, X, eim (1) 


(by computing A, and B, separately), and hence the intensity 
L,=J,J*. The factor (2/2) has been inserted to make the mean 
value of J, 29? for a completely random series. 

It is easily seen that 


2 n n 
==} X X, X, cos [(r — 4) wp] (2) 
N r=1 u=1 


Where w, = 2zp/n, or if we write as before 


H Lm = 
v7 aos ere (s» 0, 0,50), 
n=1 
then L-2 X3 ( = lal) C, cos (swp). (3) 
s=—n+1 n 


For convenience we still assume H{X}=0, but since formula (1) 
1$ not altered when E(X] 4-0, a remark on the effect of this will 
be Inserted directly. With this assumption we have on averaging 


n-1 S 
E{I,} - 2e? s Da ( - xj p, COS (SWp)+ (4) 

As n increases, this tends for most values of wp to 270°f, (Mp), 
Where f ` (w) is the (continuous) spectrum of X,. 
he usual periodogram argument considers what happens 
When X, contains a harmonic component of frequency A, so 
that p, contains a component cosAs. Since in this case the 
Spectrum contains a discrete component at w=, it is evident 
that EI) will tend to infinity at this point. The precise effect is 
readily investigated. Putting p,— Cos As in (4) we find, noting 


the identities of the sums 


fom n- s-i * 
X Xoe-u- "€ (-lsDs- X290) 


r=lu=1 s=—ntl 
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and also the formulae 


d .Sm[(or3)g] "^. Sin? pny 
m INR Enim ^ epe pega p 


6? (sin*[3n(o; 4- A)] sin? [3n(o. -2A) R 
HiL) n \sin®[}(w,+A)] ^ sin? PESSI : (8) 


In general, this will be O(1/n), but as 0, À, the second term 
—+>no*, and E{I,} becomes large, a result which is used to detect 
the presence of harmonic components in the original series. 
From the form of the sine terms, the analysis can resolve fre- 
quencies which differ by more than O(1/n). The effect of not 
measuring X from E(X) is now apparent; it will merely introduce 
a spurious discrete component in the spectrum at w= 0. 
However, for finite n, F1, from (5) is still finite and it becomes 
necessary to consider the significance of 7, in comparison with 
its probable value on other hypotheses. It has been recognized 
that as the intensity 7, fluctuates about its mean value 20? for 
a completely random series, it is advisable to allow for such 
fluctuations, which are known to follow the probability law 


P(1,2. 2] - exp (— 4zo*), (6) 
when X, is normal (cf. M. G. Kendall, 1946, § 30-49). 

Once we admit the possibility of continuous spectra beyond 
the uniform ‘noise’ spectrum of the completely random series 
the situation is more complex, for a non-uniform spectrum will 
have peaks which may be comparable to E{Z,} in (5) when n is 
finite. As a simple example, consider the linear Markov process 
with correlogram P5 7 p". This process has a continuous spectrum 
given by 1 ® 

f(w)-- E pcossw 
Lr 
7 (1-F p* — 2p cos w) 
This function has a maximum at w= 


(0« cw « m). (7) 


0, given by 
l+p 
0o)=—F 
f=, 
so that for p near 1 this function may be quite large. An example 
of a series with a more genuine quasi-period is the autoregressive 
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process discussed in$ 9:1. By the methods of § 6-3 we have for the 
series X,—a, X, 4-- as X, ,-- Y, with solution (—1«a;« — ai), 


t t+1—v tlt 
s U Hs ) z 
X= Y A sels 8 

$ A 13 — 2 " (8) 


Where x, and jt, are the roots of ¥?— ay —a,=0, the spectrum 


fao) ZO Mo) h*(w) (0<w<n7), 


o*(X) 
where h(w) = Y c I a) 
v=0 I — lta 
l [ Ih dh... ] 
=| impe® i-a | 
whence Kahe 44€ Ita 
ROR (1 +a) ([1 — a5 — 21) (9) 


(1 —a;) ([1 + ay]? + a} — 2a,(1 — ag) cos w — 4a, cos? a)" 


With a peak at cos-1 [— }a,(1—@2)/42]- 

For a fixed length of series available it will evidently be more 
difficult to distinguish strictly harmonic components from peaks 
of a continuous spectrum than from a uniform ‘noise’ spectrum, 
especially in view of sampling fluctuations, the effect of which is 
considered below. In the case of one or more discrete harmonic 
components it is in many situations correct to assume that the 
amplitude of such a component is constant from one series to 
another, In such a case the estimate of the amplitude has a 
Sampling error which diminishes with n, or equivalently 
LE U,) 9 1 as n increases (the phase of the component may be 
random from series to series, and in fact for strict validity of the 
Stationarity assumption this is necessary, but this does not 
affect Z., which is a measure of amplitude and not of phase). 

ut in problems for which the spectrum of the process is essenti- 
ally continuous, as in the autoregressive series of equation (11), 
$9-1, or in the more general linear processes of § 6-3, the ran- 

Omness of the spectral components cannot be isolated into the 
Phase, [t might seem possible to imagine physical processes 
with continuous spectra for which the phases only, and not the 
amplitudes, of the components are random from series to series. 
ut this implies that the orthogonal function Z(w) in the 
theoretical harmonic analysis of X, is additive, with, moreover, 
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a fixed spectral contribution at each frequency w; X, is then 
necessarily normal (this is effectively the theorem quoted in 
86-4; cf. also Lévy, 1948, pp. 99-100). The process X, is therefore 
stochastically identical with any other normal process with the 
same autocorrelation function. 

For the completely random normal sequence with uniform 
spectrum we have from (6) 


var (1) — 4o — LE (I,]?, (10) 


80 that fluctuations in I, tend to be of the same order as the mean 
value itself. It is shown below that this result remains approxi- 
mately true in general for linear processes. Consider first the 


quantity d 
Bie fa Beh, ji 
n r= 
so that Jy (w) is defined as Jp was for X, in (1), but for the residuals 
Y. Jy(w) is defined for any o, including values 27p/n for P 
integral. Then for the expectations 
Erlo) Jelo}, Eelo) T$(w,)} 
we have the single formula (the minus sign in the + corre- 
sponding to the second expectation) 


; iu, og) 
DE Š 5 TY dotie _ 20% Y) [1 — ein to9)] gos tos f (12) 
Tel u=1 v n 1 — eoi tes) 


In particular, when €, and w, are of the form 2zp/n this vanishes 
(o, 95), showing that Jy (0) is then uncorrelated with Jy (“2 
and J¥%(w,), or equivalently Arlo), By(w,) are uncorrelatet 
with Ay (vj), By (o) (similarly also for Jy (w,) and Jy (2) Wh® 
002—901, Or equivalently Ay(,), Bp(v,) are uncorrelated on a 
have the same mean square). Further, for the second-ore 
quantities I, (w) —Jy (w) J*(w), we easily find, either by making 


use of (12), or by straightforward algebra from the identity 
(cf. equation (3)) 


Tr(o)=2 T (esso "yf En k pa 
that, in mon 


PF (1) I (w,)} —4g*(Y) 


ao oy + 1—cosn(w,—2) |, (2 
* m! Lica uw, ' I—aos(m,— 6) 
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From (14) 

(i) var r(v)- 40*(Y), this result being more general than 
the corresponding result implicit in (6), as it holds even for Y, 
not normal, 

(i) cov p(w), Ipla)  (n|os— 0s |>1) 


f of) for Y, normal (zero if w of form 2mp[n), 


p 2 
lo( if k44- 0. 


If we now examine the corresponding quantity Jx(w) (=J, for 
%=2np/n), we note when 


X= X WX. (u-0foru«0) 
u--—co 


25 M 
Jat Jum fA X Xem, 
Nu=-or=1 
Q o an. ; 
= n X X, etr moy, gue 
u=-0 r= 


=h*(w) Jy (w) È «o(7.]. (15) 


Where A(w) is defined (of. §6-3) as X e-iuog,, provided g,->0 
©xponentially, Say, as u increases, the difference between Jx(w) 
and h*(w) Jy (v) being due to ‘end-effects’. The result 
Jx (t) ~h*(w) Jy (v) (16) 
for the sample is of considerable importance. It provides at once 
Jy (e) J5(o) ~ h(o) h*(v) Jy (0) J$). un 
Equation (17) gives further 
BET (0) J&(o)) ~ho) h*(w) By (0) THO, 
and, ag n> co, 

220?(X) f (v) =20°(Y) (v) h*(w), (18) 
as Sistently with the results of $6-3; but (17) indicates also the 
y "üptotic stochastic relation between the periodogram of a 
Mear Process and the periodogram of its residuals with uniform 
doctum, It shows that the distribution (6)is still asymptotically 

Tue, if we replace 20° by lim HJ} (it is assumed A(w) +0). 
no 
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Smoothing devices. These results show that while fluctuations 
in I, do not diminish as the length of series analysed is increased, 
the correlation between neighbouring J, and J, does, so that the 
observed periodogram will exhibit a wildly fluctuating appear- 
ance. This phenomenon has often been observed in practice 
(see fig. 12), and raises the question of how the spectrum is to 
be estimated. A suggestion due to P. J. Daniell is to use 


1 fete " s 19) 
"NEC )dw'. ( 


Its variance relative to the square of E{I(w)}, that is, the square 
of its coefficient of variation, is from the above results of order 
1/(en) for € small, en large; its bias will be unimportant for€ small 
if the true spectrum f. (o) is a reasonably smooth function at @- 
Such a smoothing device may be especially useful in the auto- 
matic electrical or optical analysis of continuous records, for 
which a similar fluctuation theory may be shown to apply- T he 
averaging would also be arranged to be automatic (e.g. electric- 
ally by use of a filter of appropriate band-width). For arith- 
metical analysis of discrete series, as in (1) and (3), the procedure 
derived below (a similar procedure is of course also available for 
continuous processes) has been found useful. 

An average taken from m independent series ‘Jengths’, if 
available, would possess the usual sampling property of its error 
being proportional to 1),/m. If the lengths were taken con- 
secutively as contiguous portions of one total length of series, 
the correlation between quantities like J, derived from different 
portions will be negligible as the length n of each portion m- 
creases, provided p, decreases fast enough as $ increases. For 
example, even between adjacent portions, we find 


n-i 
E{J (JZ) }= 20? ad, : = tel) ei Dens 


where J and J’ are obtained from the two portions, and this 
expression tends to zero as n increases provided Ps+n is O(1/n) for 
fixed s. Now averaging I, for the m subseries gives the formula 


1,-2 u 1 ls] n 
p^^ V RN Om C, cos (swp); 
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1 m—1n—-s 


x X pinu Xrts+nu (s > 0). 


where (2 
s m(n— s) u=0 rel 


This latter formula ignores, however, the information available 
in the interval s at the end of each subseries; replacing this by 
estimating the covariance from the entire data we finally obtain 


Lemootedez "S (1-2) acostu — em 
g=—ntl n 
where QC, is obtained from the entire series. The choice of n in 
this formula (for given nm) is a compromise between reducing 
the fluctuations by the approximate factor 1/,/m and not re- 
ducing too much the resolving power which depends on the 
value of n. Formula (20) is related to the harmonic analysis of 
the correlogram, which would usually have been calculated 
already, and it is often convenient to use R, (calculated using 
deviations from the mean) in place of C, in (20); this has the 
Advantage of largely eliminating the spurious component at 
(0 — 0 when E(X) 4- 0, the effect of which will now persist over the 
frequency interval 27/n rather than 2g|(nm). 
: From the sampling results for I, (unsmoothed), the distribu- 
tion of mI, (smoothed) would asymptotically be the convolution 
Or resultant of m such distributions, and Z, (smoothed) will 
ence be proportional to a x? with 2m degrees of freedom. For 
finite lengths of series this asymptotic theory is of course only 
an approximation, which will still fail even for large n if the 
Autocorrelations are not sufficiently damped, that is, as the series 
comes more like a classical harmonic series (see examples 
below), (For a more general approach to the smoothing problem 
for continuous spectra—which includes the above methods as 
Special cases—see Grenander (1951).) 


Example 1. Artificial series. The theory of this section warns us that 


a strai i i ificial autoregressive 
Tight forw, dogram analysis of the artificia g 
va a d i iolently fluctuating results, 


= Series was of second-order autoregressive type, : 
fo m tho estimated coefficients compute the spectrum by means o 
Tmula (9), but if we wish to estimate it more empirically, we may use 
ormula (20), using the observed correlations R,. Smoothed periodo- 


rams, calculated for n= 15 (m= 32) and n= 30 (m= 16), are also shown in 
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True spectrum 
nii Smoothed periodogram 


Unsmoothed (M. G. Kendall) 


40 


20 


Fig. 12. Periodogram of artificial series I (M. G. Kendall), compared with 


smoothed periodograms (n— 15 and 30) and with true spectrum. (Reprinted 
from Biometrika, 37 (1950), 6.) 
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Fig. 13. Smoothed periodograms (n — 15 and 30) for M. G. Kendall's artificial 


series III, compared with true epectrum. (Reprinted from Biometrika, 37 
(1950), 7.) 
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fig. 12, together with the truo spectrum. It will be seen that there is still 
some divergence between the true and empirical curves. The theoretical 
peak is not markedly separated from the origin, this corresponding to 
the heavy damping in the correlogram, and its separation does not emerge 
at all in the estimated spectrum. However, the divergence does not 
significantly exceed its theoretical sampling value, which is of the order 
4(1/m) (coefficient of variation). 


o 
uw 


10 15 20 25 30 

q=88u/n 

Fig. 14. Smoothed periodogram (n — 24) of Wolfer's sunspot numbers. 
(Reprinted from Biometrika, 37 (1950), 10.) 


Smoothed poriodograms (n= 15 and 30) are also shown in fig. 13 for 
a second series of 240 terms (M. G. Kendall's series III), for which 
a= —]1-1, b- 0-8. In this caso the true spectrum has a much more pro- 
nounced peak, and it will be noticed that the amplitude of tho peak of the 
smoothed periodogram fails to reach the true value, especially for n= 15. 
This effect must be expected for small n in the caso of series with lightly 
damped oscillations. It will be remembered that in the case of classical 
harmonic analysis with a discrete spectrum the true amplitude (with the 
standardizing factur being used) would be infinite, whereas that of the 
periodogram would still be finite. It may be concluded that excessive 
curtailing of the correlogram is inadvisable if the amplitude of a pro- 
nounced peak is to be estimated as well as its location. The smoothed 
spectrum is only unbiased asymptotically, approaching tho true spectrum 
if the series is long enough for adequate smoothing, and yet n is also large 
enough for adequate resolving power. 

Example 2. Wolfer sunspot numbers. In the case of the Wolfer sunspot 
numbers, no simple autoregressive model was adequate. Nevertheless, 


284 CORRELATION ANALYSIS OF TIME-SERIES 9:21 


the series is one, as Yule pointed out, for which classical harmonic analysis 
is of doubtful value, although periodogram analysis of the series has at 
times been made. In order to investigate its spectrum without a know- 
ledge of the underlying mechanism we may proceed by the methods of 
this section. The smoothed periodogram (n=24) is shown in fig. 14. 
There is a rise towards w = 0, which may be partly due to a spurious com- 
ponent at w=0, but possibly also indicates a further component of the 
Markov process type or the effect of finitely dependent residuals (the 
graduation of the original series by Yule might contribute to this). This 
suggests that a somewhat better empirical fit might be obtained for the 
correlogram by including an additional effect of this kind in the model, 
but such an extension of the empirical model would not greatly con- 
tribute to the interpretation of the observed data, and is not considered 
here in detail. Apart from its behaviour at the origin, the spectrum over 
the interesting range consists of a smooth hump-backed curve with a peak 


at about 11:0 years, in conformity with the usually accepted value of 
the ‘period’. 


9-21 Further notes and problems related to the 
spectrum. The coefficients in the transformation 
2 2mpr 2.5 2nqr 
A,= |- X X,cos—— = X X,sin—— 
E Jed: „COS D B, Ja Xen 2 
for p=0,1,...,4(m—1), q—1,2,..., 1(n—1) (n odd), or p=0,1, 
+4”, q—71,2,...,$(n—2) (n even) are well known to form an 
orthogonal set, so that we have the identity (provided the factor 
2 in J(2/n) is omitted for g=0 and p = 3n) 


n 


X Xi2E,A4)4XQBi (1) 
r-l 
in 
~ 1, (3(n.— 1) instead of 1n if n odd), (2) 


where the 7, are the periodogram intensities defined in $9.2. 
Now reference to the sampling theory of distribution functions 


outlined in $4-1 (see p. 92) will recall that for X, independent 
and normal (zero mean and variance o*), so that I, has the 
exponential distribution, the ‘random walk’ quantity 


D in 
Xl ( X Ip given equal to its expectation) : 
p'-0 p'=0 


or equivalently (from the sampling properties of J, for normal X J 


T-5LÍl*i,-9 3 
2 aa lai" Bw (3) 
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say, may be tested against its expectation (~ 2p/n) by checking 
that the deviation of 7, does not exceed the + A/J/ (32) boundary, 
the probability of remaining within the boundary being 


E (Cuyeme, (4) 
s=—0 
(critical values are 0-95 if A= 1-36, 0-99 if A=1-63). 

This test is asymptotically valid for X, independent and 
normal, but as the test criterion is a ratio and a function of the 
serial correlations R, of X, defined in § 9-1, it will be insensitive 
to non-normality, in particular the asymptotic variance of 7, 
will be independent of the fourth cumulant x, of X,. It may in 
fact easily be verified from formula (14), § 9-2, that 


varS, _ 2E{S,}cov {S,, Sin} _ [E(S,]P var Sy, 
ver Tints JF LEIS) LS 
~ 4p(1—2p/n)/n?, (5) 


independent of k, and in conformity with a random walk 
restricted to T4, — 1. 

To make use of the above result, we recall further the relation 
(equation (15), § 9-2) 


Tx(w) =h*(w) Jy (0) (1+ O(1/4m)), 


80 that if the X, are not independent but constitute a linear 
process with continuous spectrum f, (wv), we have 


v) po) h*()] - Ip (o) 1-00). (to) #0), 
endt S xto or) i) [ È Io yJa+oanm), () 
as I (o) > 0, and has the same expectation for different w. As 

h(@,) h*(w)]f (Oy) = h(W2) A*(03)/f (v), 
We obtain finally 
n Soy A Ren] G 


T In these sums over p’, w’ stands of course for 2mp’/n (and w for 2zp[n). 
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The statistic T, is available for constructing a confidence band 
for the entire spectral function f. (v), though as it involves f, (v) 
in a rather complicated way, would be more readily available for 
testing the goodness of fit of a function f, (v) given a priori. 

To avoid the individual weighting of each item Iy(w) in (7), 
one would as an alternative consider the statistic 


U,- X Tew’) Y Teo’). (8) 


This implies a weighting of each Jp (w) by the factor f, (v). An 
appropriate test procedure using U, needs further investigation, 
for U, is no longer, even for normal processes, equivalent to a 
random walk quantity (unless its denominator is fixed in value). 
Its variance is, however, still asymptotically independent of 
K; for linear processes, being obtained by similar methods as 


T [o-ren E RoRo € Aw], c 
p'-0 p'=p+1 


where we have written 
2 AT ' 
S rn] È tnr n). (10) 
p'=0 p'=0 


If we further standardize U, by dividing its deviation from 
F,(w) by 


m J [RA T á J [7 I Rwa | =v, (Q1) 


it is suggested that the quantity [U, — F, (»)]/ v could (as an 
interim approximate method) be tested in the same way as 
T, — 2p[n. Such an alternative criterion to T, still only involves 
the spectral function f, (v), but it might be convenient to insert 
2, consistent estimate of v from the sample, as this would not 
affect the test of U, for large n. Such an estimate is, for example, 


obtainable from 
in in 2 
Egwn[| x rof, (12) 
p'-0 Zu) 


provided we recall that the mean value of 7} (w) is asymptotically 
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twice the square of E{Ix(w)}, so that the expression in (12) has 
asymptotic value 


DT sm singe [fey u 
S Ara (F 


Grenander and Rosenblatt (1952) suggested the use of the 
unweighted and unstandardized quantity 


e n mB, 
%= | Ix(w!) du’ «27. S Ilo’); (13) 
0 Nn p'=0 


even U,, however, has the advantage, in addition to being in- 
sensitive to K}, of being an estimate of the spectral function 
F (w) corresponding to the correlations, whereas V, is an estimate 
of o%F,(w), corresponding to the covariances, and this will 
usually be of less interest when o% is unknown. In principle, 
the first statistic T», appears superior to V, or Up. 

Mixed spectra. In many problems it is reasonable to suppose 
it is known a priori when the spectrum is discrete, as was always 
assumed classically, or continuous, as in the case of non-deter- 
ministic autoregressive series and linear processes. It might be 
noticed that any additional discrete components with known 
frequency, such, for example, as annual or other seasonal 
variation, can be removed as a first step; any other spectrum 
present complicates the error of estimation of the amplitude and 
phase of such components, but not much more so than in the 
elimination of a simple mean. 

Inasecond type of problem, it may be required to discriminate 
between two hypotheses H,: a continuous spectrum, Ay: a 
continuous uniform spectrum plus, say, one discrete component 
with unknown frequency. Here, as noted in § 9-2, the difficulty 
is that for any finite sample the periodogram intensity for the 
discrete component is still only finite, and may be confused with 
the peak of a continuous spectrum (or vice versa). An appropriate 
criterion in such a case, if we may assume the Y, normal, would 
be the likelihood ratio p/p, where the probabilities py and p, 
for the sample on each hypothesis would require maximizing 
for any unknown parameters (these being restricted to a few 
autoregressive coefficients, say, in case H,). We have (cf. equation 
(2), §9-11) 

log py—log py ~ —i(n-2)(log 2g — log X), (14) 
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where Xi denotes an adjusted sum of squares of the residuals 
after fitting an autoregressive scheme of order 2, say, and 2 the 
sum of squares of residuals after fitting a discrete component 
(two degrees of freedom are allowed in X, for the amplitude 
and phase, the additional error in locating the frequency being 
neglected). The significance of this ratio could be based on 
sequential analysis theory (cf. $4-1, pp. 94-6). We may regard 
the entire set of n observations as the first of a number of such 
sets, so that for equal maximum risks of error ¢)=€,=€, H, 
or H, would be adopted according to whether the right-hand 
side of (14) was lower than loge—log(1—e) or higher than 
log (1— €) —log ¢; if it lay between these two values, no decision 
could be reached. (Such maximum risks can be rather over- 
cautious when the probability of a decision with the single 
sample available is not small.) 

In a third class of problem which would fortunately appear 
still rarer in statistical applications, both discrete and continuous 
spectra may have to be considered simultaneously. Whittle 
(1952) has suggested as one possible method an approximate 
standardization of the sample periodogram as in (6) (the con- 
tinuous spectrum f,(w) being estimated provisionally by, say, 
the smoothing devices of § 9-2) in order to test and remove the 
discrete spectrum; one or more repetitions of this procedure 
might be necessary. Further investigation of this and the other 
possible analyses referred to above seems desirable before their 
practical value can be finally assessed (see also Bartlett (1954), 
where some of these tests are illustrated numerically). 


9:3 Multivariate autoregressive series 


Discussion so far in this chapter has been confined to the 
analysis of the correlation structure of single series, but the 
simultaneous relations among two or more series may have to 
be considered. For example, econometric models may consist 
of a number of relations with unknown coefficients which have 
to be estimated, and correspondingly, the cross-correlations 
between series will have to be investigated. The problems raised 
are similar in principle to those for a single series, but tend to 


become more complex, and are not considered here in complete 
detail. 
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Asan example of the more complicated formulae, we quote the 
covariance formula for the sample cross-correlation 2,,(s) 
between X,(f) and X,(t+s) for two real normal sequences 
X), X4(t): 
cov (£,s(s), Rials +#)) 


1 ao 
ik ome p {Pa (V) Pool +t) + pax (v) pia (9 + t+ 28) 


+ Pra(8) pia (8 +t) [ps (v) + 93; (0) + 3023 (9)] 
— Pas) [P (V) past +E +t) + pax (7) pos (o +5 0] 
— py 8 +t) [pss (0) Pral + 8) + Parl) paso + 8)]}- (1) 
If the sequences are not normal, we must also include in this 
formula an expression involving fourth-order cumulants; even 
for linear processes this extra term does not necessarily vanish. 
If in (1) we put X,(t)=X,(t), it reduces to formula (8), § 9-1. 


If, alternatively, X,(t) and X,(t) are sequences independent of 
each other, we obtain 


1 eo 
var (Fy(s)) ~ X Pult) Pool), 
n—8s o 


1 e 
X pu) Poo? +t). 
n—85 ío 


(2) 


cov (Iss), Rials 4-0) 


Corresponding formulae hold for continuous processes, with 
Integration replacing summation. 

The estimation problem requires a careful formulation of the 
equations comprising any multivariate autoregressive model, if 
consistent estimates of uniquely defined coefficients are to be 
obtainable. The set of equations must in general be considered 
together, with a specification of the assumed relation between 
the residuals of the different series. For example, with two 


coupled series specified by the model (cf. § 5-2, equation (32)) 
Xitan Xo d (3) 
Y, Gq) X, a t 422% = Vo 


While the least-squares estimates of a5; and a say, may ob- 
viously be obtained from the first equation alone if U, and V, are 
Uncorrelated, in the correlated case the least-squares estimates 
are defined as the asymptotic maximum-likelihood estimates on 
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the assumption of normal processes, and involve minimizing 
the quadratic expression in the two sets of residuals occurring 
in the logarithm of the likelihood function. Fortunately, how- 
ever, in the usual case when none of the coefficients is assumed 
zero a priori, it is readily shown that the least-squares estimates 
of all the coefficients obtained from the separate equations also 
satisfy these maximum-likelihood equations, and are con- 
sequently still the optimum estimates. The asymptotic standard 
errors of these least-squares estimates may also be derived by 
methods similar to those previously used. 

Similar methods apply in the case of continuous processes. 
The estimation problem for one or more series will be found 
further discussed in econometric literature, at least in the case 
of discrete time (see, for example, the monograph Statistical 
Inference in Dynamic Economic Models, edited by Koopmans 
(1950), in which some attention is given to the uniqueness or 
‘identification’ problem; cf. also Wold, 1953). 

The methods for testing the goodness of fit of the correlogram 
may also be extended to multivariate series. Consider a set of 


simultaneous autoregressive Sequences, written in vector and 
matrix notation, 


HX =[I+A, Er x... A, Ej 7] X, U,, (4) 


where U, is independent of U, , (r 4-0). It will be convenient to 
write equation (4) equivalently in 'tensor notation, with the 
summation convention for any repeated suffices, 


Ht) X(t) = Ut). (8) 


H, = H;;(t) is also a difference operator on the time t, as indicated 
by (4). In the notation of (5), we write (as the processes are real, 
we revert to the convention for covariance matrices used in § 5-2) 


] ^-7 
C(t) = "EE 2 Zu) Xpy(u 4-7), 


E(X,(u- 7) X (u-1)) -V,(t— 7). (6) 


For comparing (4) with some wider class of model, likelihood 
criteria can be directly obtained on the assumption of normal 
residuals as in the univariate case. We shall, however, derive 
the extension of the test discussed in detail in § 9-11, which, 
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though asymptotically equivalent to the likelihood criterion, 
depended on the basic autoregressive model only. We have 


Eal) Cy(t)}= E E X) U(u4r)-0 (r»0) (7) 


J 
H,(r) Hy s(t) V,(t— 7) = EU, (t) Ty(7)} 
=W,,6(t—7), (8) 


say, where 6(0)=1, 6(¢—7)=0 (t+:7). Further 
E(Ha(T) Cix(7) H,,(t) C, (t) 


n—t n—T 
-E I X Y* Xj») Uv 7) X,(u) Un) 


(n—1) (n—7) wai v= 


= EU) E(X (9) Xj(u+t—n)} 


1 7 
=H A aa 0 9 
m Wy (t T) (i27 > j" ( ) 


ip "r 
From (8) and (9), 
E(H,C- 0) H palt) C4). Hayl = 7) Har) Cirt) 
AIL WNaTT) (270. (10) 
This is the appropriate generalization of H_,H,C, in the uni- 
variate case, but we still have to generalize the more useful form 


HTC, and this is not so immediate, asin general V,j(t — 7) + V,;(7 — t). 
We must look for a new operator G;,(t) such that 


Gart) G,(T) Vj(t—7) = Tj,9(t— 7), (11) 


where we expect G;(t) to be associated with the series (5) re- 
versed in time, and to be of ‘length’ p. We shall then have as 
an alternative to (10) 


E(G, (t) H palt) C44) Gas) Hilt) Cj (7) 
= WyTndt—7) (R72 p). (12) 


We need only find G;,(t) for p=1, for we saw in §5-2 that the 
more general case can always be regarded as a degenerate case 
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of a first-order equation in more variables. In this case, reverting 
to matrix notation, H,- IAE; 
H,Vi_,= H,E(X,Xij 
=V; + AVi- 
=E{U,X}=0 (27) (13) 
From (13) we have (cf. equation (38), § 5-2) 
Vi tAV,=0. (14) 
Suppose we define similarly 
G,sI-BE/! 
=I- V, Vp) E72. (15) 
Then we shall find that G, has the required properties, and that 
G_,X,=Z, 
has similar properties to the original equation, with time 
reversed. (The operator G, is only identical with the original 
operator H, if the covariance matrix V,_, is symmetric. It is 
interesting to note that this condition is made a basic postulate 
in non-equilibrium thermodynamics; see de Groot (1951).) 
We have EX, Zi = E{X,,, X,G. 
(where the operator G4; acts on Xj) 
=Vi-ViaVeivi=0 (16) 
from the result V;=(—A)? Vo, and also 
E(Z,Z)- E(G. ,X,.XjG^) 
-EQX,- Vi V9! Xr) (Xi- Xi 4, Vo V) 
= Vi, -V1 Vo Vosa - Vosa Vo Vi V, Vat Vy, Vo? Vi 
=0 (t+7). (17) 
For t=7, the matrix T introduced in (11) is given by 


G,V, G= E(Z,Z) = V+ BV, 


=[G V] (18) 
whereas W=E{U,U}=V,+ AV, 
- [H, Vi]-o- (19) 


It should be noted that even if W is diagonal, T is not neces- 
sarily so. 
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The result (12) has similar advantages over (10) as in the 
univariate case, when estimation of the coefficients is necessary. 
For if after estimation we use h,,q(¢) in place of H,,(t), etc., 
(n.— 0 (gis 0) hpalt) — G0 Hpalt)] Cral) 
= (n — 0 gi.) — Fart) (0) Croll) 
+ (n— 01 Gy lt) [i54 0) — Hpalt)] 00 


~0 (tp) (20) 
because hy) C4) ~ Hpalt) V0)-9 (9) 
and Gi p(t) Cral) ~ Ga, 0) VU) = 0, 


from (16). 

As T; is not in general diagonal even if W; is, it is desirable in 
(12) to transform the linear forms in C; further so that they are 
uncorrelated also for t=7. This is relatively simple, when it is 
remembered that the latent roots and latent vector components 
of a product W,„ Tyn are given by A, Ji, ANA Tj Ysp Where Àp, tpj are 
the latent roots and vector components of Wp, and zs, Ysk those of 
Tyn Uncorrelated forms may then be obtained by taking linear 
combinations of the original forms, with the latent vector 
components as coefficients. The asymptotic normality of these 
forms is established by extensions of the theorems for the 
univariate case. 

The extension of these methods to m-dependent and con- 
tinuous multivariate series has been indicated by Bartlett and 
Rajalakshman (1953). Their discussion includes also a numerical 
illustration of the above test on the bivariate artificial series 

X,-06X, ,405Y.,— E! 

y,-04X, 4 - 09Y,., 7 V. 
The beginning of this series (with U, and V, uniform and indepen- 
dent of each other) is shown in fig. 15, and indicates how such 
bivariate series, even of the first order, can exhibit oscillatory 
behaviour. 

In the paper just referred to, two further points were em- 
phasized in the course of the numerical check on the artificial 
series constructed from series (21). The first is that the totalling 
of the individual x? items (cf. §9-13) does not check the ortho- 
gonality of these items before squaring, and in the case of simul- 


(21) 
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taneous series it is at least advisable to check also the absence 
of simultaneous correlation between the final linear forms 
computed. The second is that the rapidity with which the asymp- 
totic theory becomes available diminishes as the number & of 
degrees of freedom in the total x? increases, as is indicated by 
the following theoretical results for the variance of the total 


Fig. 15. Realization of a bivariate autoregressive Markov process, 
(See equations (21); X, is the unbroken line in the figure.) 


x? obtained from the H(t) H(—t) operator for an autoregressive 
series given a priori (this test we have seen is equivalent to testing 
the direct lagged correlations of the residuals). The mean x? is 
equal to the number of items k. 

One series: Variance of y? = 


i 1 
an * uU + 2yg) + (5 + 4704 1y8)] +o(3))- 


T'wo series: Variance of y? = 


sefia WOE + 2870) i-e, typ e o(3)]. 
where k=4k' (k' being comparable to k for a single series, in- 
dicating the order of the maximum correlation lag considered), 
and y, is the coefficient of kurtosis kalk} (depending on the 
departure from the normal or Gaussian distribution in the fourth 
moment) in the distribution of the residuals. 
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Frazer, R. A., 32, 37, 296 

Fréchet, M., 26, 28, 32, 135, 296 

Fredholm integral equation, 16 

Frequency function, see Probability 
density 


Galton, F., 40, 296 
Gambler's ruin, 20, 89 
Gaussian, see Normal processes 


Genetio recombination, 23 

Genetics, population, 120 ff. 

Geometric law, 101 
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